Handmade Penguin Chapter 15: Platform-independent Debug File I/O

This chapter covers roughly the content in the Platform-independent Debug File I/O part of the Handmade Hero course, under the Linux operating system.

<-- Chapter 14 | Back to Index | Chapter 16 -->

To Stream or not to Stream

At some point in our game, we're going to want to load or save data. We'll need to have things like settings and saved games persist between runs of the game, and we'll also want some way of loading our assets: sprites, sounds, music, levels, etc. These are actually two separate problems: persistant settings need to be writable, whereas we only need to read our assets, but typically wish to do so in the background.

Implementing a fancy resource cache and persistant data storage system is beyond the scope of this chapter: we don't even have any data yet. We will however, need some way of loading and saving data so that we can start working on the game proper.

The traditional way of reading and writing data is to use streams. These represent an abstract data source, which we can read bytes from the end of. For example, the C standard library has an API like so:

FILE *FileHandle = fopen(FileName, "rb");
int Data1, Data2;

if (fread(&Data1, sizeof(int), 1, FileHandle) == sizeof(int))
{
	if (fread(&Data2, sizeof(int), 1, FileHandle) == sizeof(int))
	{
	}
	else
	{
	// Error
	}
}
else
{
// Error
}

fclose(FileHandle);

This technique obviously works, but some issues are already apparent. We're having to check for errors after reading each individual variable. We're having to manage handles and do a lot of calling back into our platform layer. Despite their obvious advantages in writing log files and similar, they're not ideally suited for most of our I/O.

A much simpler API would simply be to read an entire file into memory and return a pointer to it. This, too, is too simple for a complete, asynchonous file API, but it will more than serve for now.

struct debug_read_file_result
{
	uint32 ContentsSize;
	void *Contents;
};
internal debug_read_file_rsult DEBUGPlatformReadEntireFile(char *Filename);
internal void DEBUGPlatformFreeFileMemory(void *Memory);

The Linux File API

To implment this, we have a number of choices. We can use the fopen() function and friends from above. We can use the POSIX open() function and API. We could even use the mmap() call to have the operating system do everything for us! We'll start off using open().

The open() function is actually pretty simple. It takes two arguments:

path: The path to the file we want to open.
oflag: A bitwise-or of options. To open the file for reading, we need O_RDONLY; for writing, we need O_WRONLY; for both, O_RDWR.

open() returns an integer file handle. So, to open our file, we'd use:

debug_read_file_result Result = {};
int FileHandle = open(Filename, O_RDONLY);
if (FileHandle == -1)
{
	return Result;
}

Be warned! MS-DOS, and hence Windows, has a very similar file API. You do, however, have to use the O_BINARY flag if you are reading binary data, else the conversion from Windows to Unix newlines will corrupt your file. This is an excellent way to spend early-morning hours debugging "random corruption."

We also need to know the size of the file. For this, we'll use the fstat() function. In a characteristic feat of Unix function naming, the stat() function and its relatives return the status of a file, which basically includes information like the file's size, permissions and last-modified time. While stat() accepts the path to a file, fstat() accepts an already-open file descriptor — our file handle. Both stat() and fstat() accept a pointer to a struct stat, which they fill in with the requested information.

struct stat FileStatus;
if (fstat(FileHandle, &FileStatus) == -1)
{
	close(FileHandle);
	return Result;
}
Result.ContentsSize = FileStatus.st_size;

One thing we have to be careful of is what happens if our file's size is 4GB or greater — the largest size we can represent in a uint32. the st_size member of stat is of type off_t, which is usually 64-bits. Let's write a function to check that:

inline uint32
SafeTruncateUInt64(uint64 Value)
{
	Assert(Value <= 0xFFFFFFFF);
	uint32 Result = (uint32)Value;
	return(Result);
}

We can now replace the problematic line from earlier:

Result.ContentsSize = SafeTruncateUInt64(FileStatus.st_size);

Be warned! When I said that off_t was usually 64-bit, I was not telling you the entire truth. On 64-bit systems, off_t is 64-bit. On 32-bit systems, it is 32-bit unless we define _FILE_OFFSET_BITS=64. Look up the Large File Support extension to read about the many hoops you'll need to jump through to have large files work properly in your 32-bit Linux program. For us, having off_t be 32-bit won't actually matter, so we'll quietly ignore the problem for now.

We'll now need to allocate some memory in which to read our file. As this is just a debug function, and won't be used in the final game, we'll just use malloc():

Result.Contents = malloc(Result.ContentsSize);
if (!Result.Contents)
{
	Result.ContentsSize = 0;
	close(FileHandle);
	return Result;
}

To actually read our data, we'll use the read() function. read() is very simple, it takes three arguments:

fd: The file descriptor (handle) of our open file.
buf: A pointer to the memory we'll be reading into.
count: The number of bytes to read.

and returns the number of bytes successfully read. We can therefore read in our file quite simply:

uint32 BytesToRead = Result.ContentsSize;
uint8 *NextByteLocation = (uint8*)Result.Contents;
while (BytesToRead)
{
	uint32 BytesRead = read(FileHandle, NextByteLocation, BytesToRead);
	if (BytesRead == -1)
	{
		free(Result.Contents);
		Result.Contents = 0;
		Result.ContentsSize = 0;
		close(FileHandle);
		return Result;
	}
	BytesToRead -= BytesRead;
	NextByteLocation += BytesRead;
}

Be warned! read() will read all of the bytes you ask in one go almost all the time. Apart from the obvious case (reaching the end of the file), read() can return fewer than the requested size if the process receives a signal. It's therefore better to be safe than sorry, and keep trying until we get all of the data, or an error occurs.

Finally, we need to close our file descriptor, which can be done trivially with the close() function:

close(FileHandle);
return Result;

We also will need to #include a few headers to let these functions work:

<sys/types.h>
<sys/stat.h>
<fcntl.h>
<unistd.h>

That wasn't so hard now, was it? The fopen() method is pretty similar, too, so if you prefer you can try that.

We'll also need to release the memory we allocated. This is pretty trivial:

internal void
DEBUGPlatformFreeFileMemory(void *Memory)
{
	free(Memory);
}

Writing to disk

Writing to disk is even easier, as the caller will have already allocated the memory and will know the size of the data:

internal void
bool32 DEBUGPlatformWriteEntireFile(char *Filename, uint32 MemorySize, void *Memory);

For the most part, the code for this is identical to our reading code, save for the lack of memory allocation and using the write() system call. write() itself has an almost identical interface to read(): it accepts an fd, a buffer buf and length count.

int FileHandle = open(Filename, O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);

if (FileHandle == -1)
	return false;

uint32 BytesToWrite = MemorySize;
uint8 *NextByteLocation = (uint8*)Memory;
while (BytesToWrite)
{
	uint32 BytesWritten = write(FileHandle, NextByteLocation, BytesToWrite);
	if (BytesWritten == -1)
	{
		close(FileHandle);
		return false;
	}
	BytesToWrite -= BytesWritten;
	NextByteLocation += BytesWritten;
}

close(FileHandle);

return true;

Most of this is pretty straightforward, but you may notice the huge amount of extra stuff in open(). Because we'll want to create the file if it doesn't already exist, we need to use the O_CREAT flag. This comes with a caveat, though: if we use the O_CREAT flag, we need to tell the operating system what permissions are needed to read and write to the file. This is done by the magical third mode parameter, which only appears when using O_CREAT. mode imply stores normal Unix permissions, which are a bitfield. Describing exactly how Unix permissions work is beyond the scope of this tutorial (though you should definitely check it out if you don't already know), but we'll have a quick look at what we're using:

S_IRUSR: Readable by the user who owns the file.
S_IWUSR: Writable by the user who owns the file.
S_IRGRP: Readable by the group which owns the file.
S_IROTH: Readable by everyone else.

This corresponds to the octal 0644.

Aside: Loading files with `mmap()`

Since we used mmap() to allocate memory earlier, it's worth looking at how to use it for its original purpose: mapping files. Mapping a file is basically telling the operating system that you want reads (and sometimes writes) to a piece of memory to actually read (and write) to a given file. The operating system will handle all of the actual loading and saving behind the scenes. Compared to just allocating some memory and reading the file in, using mmap() has some advantages and disadvantages. mmap() can be faster than reading the file in by hand: if you don't actually read part of the memory you get, that part of the file might not even be loaded. Even if you do: the contents of the file don't need to be copied into the buffer you provide, the operating system can do things like map in part of its own filesystem cache. On the other hand, accessing the memory mmap() gives you can be slower, as it might incur a new disk operation. For that same reason, if the disk disappears (for example, if you're reading off a CD or USB device), then the memory you thought you had already loaded may no longer be valid: ouch. We're not going to use mmap() just yet in the actual game, but it's fun to try it out.

We still need to do most of the setup the same way: open() the file desciptor and get the file size. Then, instead of malloc()ing a block of memory, we just call mmap():

Result.Contents = mmap(0, Result.ContentsSize, PROT_READ | PROT_WRITE, MAP_PRIVATE,
                       FileHandle, 0);

We can then close() our file descriptor. The file will remain mapped until we call munmap().

When we're done with the mapping, we call munmap(). munmap() accepts the pointer to our mapping (or a part thereof), and a length in bytes to unmap. This presents us with a bit of a problem: our DEBUGPlatformFreeFileMemory() function does not accept a length parameter. Since this is a demo, we'll just hardcode one page (4096 bytes) in.

Be warned! mmap() rounds all addresses and sizes to multiples of the system page size, which is the smallest unit of memory which can be individually mapped from physical to virtual. On x86 processors, this it 4096 bytes, though Windows for some terrible historical reasons has it set to 64k. You can check the page size with sysconf(_SC_PAGESIZE) or (Linux-specific) getpagesize().

End of Lesson!

And that's the end of Week 3. It seems like we've only just started, but look at how much we've done! Next week will be devoted to cleaning up the code we have, and getting a start on some stuff for the actual game. See you then!

If you've bought Handmade Hero, the source for the Linux version can be downloaded here. It also includes a version using the mmap() system call. Note that you will require the official source code for handmade.h and handmade.cpp.