Handmade Penguin Chapter 8: Playing a Square Wave with SDL

This chapter covers roughly the content in the Writing a Square Wave to DirectSound part of the Handmade Hero course, under the Linux operating system.

<-- Chapter 7 | Back to Index | Chapter 9 -->

Meeting Spongebob Squarewave

As you'd expect: much of the actual code behind producing a sound is the same, no matter which platform we're coding for. At some point, we need to generate a buffer full of samples. It's only the details of where that buffer is that changes.

As mentioned yesterday, there are two ways of playing sound in SDL: using a callback function, and queuing audio with the SDL_QueueAudio() function. Using the SDL_QueueAudio() function makes things significantly easier, as it allows us to choose when we want to provide the audio data, rather than having SDL ask for it at (possibly inconvenient) times. In fact, the SDL_QueueAudio() API is significantly easier to use than DirectSound is.

Be warned! If you don't write to DirectSound's buffer in time, it will keep playing whatever was left from last time. If you don't call SDL_QueueAudio() in time, you get silence. Which is better depends on the situation, but it's important to understand that they're different.

To start off, let's get some variables set up:

// NOTE: Sound test
int SamplesPerSecond = 48000;
int ToneHz = 256;
int16 ToneVolume = 3000;
uint32 RunningSampleIndex = 0;
int SquareWavePeriod = SamplesPerSecond / ToneHz;
int HalfSquareWavePeriod = SquareWavePeriod / 2;
int BytesPerSample = sizeof(int16) * 2;
These are all identical to what we have on windows:

We'll also need to actually generate the wave. The first question we need to ask ourselves is "how much audio data should we generate?" This is actually a surprisingly hard question to answer, so we'll cover it in detail later on. For now, let's just say we should generate one frame's worth of audio. If our game is running at 60 frames per second, that's 800 samples (at 48000 Hz) per frame. Let's set our BufferSize to 800 samples (which is 800 * 4 bytes), and try to fill it every frame.

int BytesToWrite = 800 * BytesPerSample;

Oops! Yesterday, we wrote a SDLInitAudio() function, which accepted a BufferSize parameter for the number of samples (well, the number of 1-channel samples, so half of what we've been calling samples above). The windows code's BufferSize parameter takes a number of bytes. Let's change our SDLInitAudio() function to accept a number of bytes, by dividing BufferSize by two when initialising AudioSettings.samples:
AudioSettings.samples = BuffserSize / 2;

Now that we know how many bytes we want to write, let's get writing. First we'll need some memory to write it into. We'll just use malloc() for now. This has some problems (it'd be nice to reuse as much of the memory as possible from frame to frame), but will suffice for now.

void *SoundBuffer = malloc(BytesToWrite);
int16 *SampleOut = (int16 *)SoundBuffer;
int SampleCount = BytesToWrite/BytesPerSample;
Now all we need to do is write our square wave into the buffer. The code for this is almost identical to the windows version. We don't (yet) even need to handle ring buffers:
for(int SampleIndex = 0;
    SampleIndex < SampleCount;
    ++SampleIndex)
{
    int16 SampleValue = ((RunningSampleIndex++ / HalfSquareWavePeriod) % 2) ? ToneVolume : -ToneVolume;
    *SampleOut++ = SampleValue;
    *SampleOut++ = SampleValue;
}

Finally, we just need to queue the audio with SDL_QueueAudio() and free() the buffer we used:

SDL_QueueAudio(1, SoundBuffer, BytesToWrite);
free(SoundBuffer);
While we're here, let's move our SDL_PauseAudio() code from our SDLInitAudio() function down here to match the stream. We'll add a SoundIsPlaying boolean (initialised to false), and check that it's not already playing:
if (!SoundIsPlaying)
{
    SDL_PauseAudio(0);
    SoundIsPlaying = true;
}

If we compile and run (remember that SDL_QueueAudio() requires SDL 2.0.4), we'll hear our square wave. Woohoo! However, we may find that there's some gaps in the sound. Let's fix that!

Buffers: Latency and Leeway

The gaps appear because the timing of our frames aren't precise. Sometimes they take 797 samples, sometimes 813: it's quite rare for it to take exactly 800. If it takes too long, we might run out of sound data, leaving a whole frame where we don't have any sound at all. To fix this, we need to make sure we always have a bit of extra sound buffered.

If we just increase BytesToWrite to (for example) two frames worth of data, we'll fix the problem. But we've also caused another: the amount of sound data queued up is growing and growing. We can use SDL_GetQueuedAudioSize() to see the queue growing without end. This presents two problems: not only will we eventually run out of memory, but the latency of our game will be increasing. Clearly we need to look at the way the windows code handles this, and to do so, we need to look at the differences between DirectSound and SDL's buffers.

In DirectSound, we only have one buffer, a ring buffer. When we've reached the end of the buffer, we start again from the beginning. In SDL, we have several buffers: each time we reach the end of a buffer, we start playing the next buffer. This has surprisingly far reaching effects. To start with, we wanted to provide a large buffer size with DirectSound, so that if our buffer underran (we weren't able to write data to the buffer quickly enough), the loop we got would be larger. The buffer size is therefore the maximum distance we can write ahead (and therefore the maximum latency we can have).

SDL's buffer size is the amount of sound data the operating system will grab at a time. We therefore have to write a minimum of BufferSize bytes ahead. In this sense, the buffer size in SDL is basically eqivalent to the distance between DirectSound's play and write cursors. SDL doesn't have a maximum amount of audio data we can queue, though as it increases latency, we do want to limit it ourselves.

So how much data do we want to write ahead? We need to always have at least the (SDL) buffer size amount ready, and we want to have some maximum amount we write ahead. Instead of always writing the same number of bytes, let's try to keep the same number of bytes in the queue. To do that, instead of having BytesToWrite always be the same value, we should make it be however many more bytes we need to reach our target queue length.

int TargetQueueBytes = 48000 * BytesPerSample;
int BytesToWrite = TargetQueueBytes - SDL_GetQueuedAudioSize(1);
Here our TargetQueueBytes is set to an entire second's worth of audio data (this is what the windows version does: filling a large secondary buffer). This gives us plenty of leeway, and while latency is high (at least a whole second), it is not growing. Because BytesToWrite could now be 0, however, we should probably check that we're actually going to write something. Let's wrap all of our sound code in a big if (BytesToWrite) block.

Check it out! In theory, you should be able to get down to only one frame of latency (a buffer big enough for two frames worth of audio data). This does work on some setups, but it isn't something you can rely on — some systems just can't quite keep up. Another technique you can use is to measure the time taken since the last frame, and add that much audio. This is more robust in the face of framerate dropping, though it can present its own challenges, and is typically more complicated.

The Lord of the Ring Buffers

As cool as SDL_QueueAudio() is, it presents two problems. Firstly, the model it presents is significantly different to DirectSound: which makes porting our DirectSound code quite challenging. Secondly, it requires the newer version of SDL. While if you were shipping a game, you could just include a working copy of SDL2, it can be something of a pain throughout development.

So why not try implementing our own ringbuffer, using SDL's audio callback. It's uphill work, but in the end we'll have something pretty similar to DirectSound. We'll start by defining a struct to store all of our ringbuffer data:

struct sdl_audio_ring_buffer
{
    int Size;
    int WriteCursor;
    int PlayCursor;
    void *Data;
};
This should be pretty self explanatory:

The bulk of the work is in the audio callback. We'll pass in our sdl_audio_ring_buffer as the UserData for our callback. We'll then work out what bit from our ringbuffer we need to copy into the buffer SDL gives us, and copy it. The tricky bit here is that we need to wrap around: we might need to copy two smaller bits:

internal void
SDLAudioCallback(void *UserData, Uint8 *AudioData, int Length)
{
    sdl_audio_ring_buffer *RingBuffer = (sdl_audio_ring_buffer *)UserData;

    int Region1Size = Length;
    int Region2Size = 0;
    if (RingBuffer->PlayCursor + Length > RingBuffer->Size)
    {
        Region1Size = RingBuffer->Size - RingBuffer->PlayCursor;
        Region2Size = Length - Region1Size;
    }
    memcpy(AudioData, (uint8*)(RingBuffer->Data) + RingBuffer->PlayCursor, Region1Size);
    memcpy(&AudioData[Region1Size], RingBuffer->Data, Region2Size);
    RingBuffer->PlayCursor = (RingBuffer->PlayCursor + Length) % RingBuffer->Size;
    RingBuffer->WriteCursor = (RingBuffer->PlayCursor + 2048) % RingBuffer->Size;
}
You can see how we split the sound we need to copy up into two halves. Mind the magic '2048': that's the size of SDL's audio buffer (in bytes).

In our SDLInitAudio() function, we'll change BufferSize to refer to the size of the ring buffer. We'll just make SDL's audio buffer 1024 samples long for now. We also need to initialise our ring buffer. We'll make a global ring buffer (called AudioRingBuffer) and fill it in:

AudioRingBuffer.Size = BufferSize;
AudioRingBuffer.Data = malloc(BufferSize);
AudioRingBuffer.PlayCursor = AudioRingBuffer.WriteCursor = 0;

Finally, we can pretty much copy-and-paste our DirectSound code. We won't need to make any calls to DirectSound, though. We can access our PlayCursor, WriteCursor and Data members without calling. What we do want to do, though, is prevent our code from seeing the PlayCursor change while we're accessing it. We'll use SDL_LockAudio() and SDL_UnlockAudio around the code that accesses PlayCursor. We also need to compute Region1, Region1Size, Region2 and Region2Size ourselves:

void *Region1 = (uint8*)AudioRingBuffer.Data + ByteToLock;
int Region1Size = BytesToWrite;
if (Region1Size + ByteToLock > SecondaryBufferSize) Region1Size = SecondaryBufferSize - ByteToLock;
void *Region2 = AudioRingBuffer.Data;
int Region2Size = BytesToWrite - Region1Size;

End of Lesson!

Thank the heavens that's over! This chapter was a real struggle to write (getting sidetracked reading through SDL and driver source code in a futile attempt to reduce latency didn't help), and I'm not 100% happy with it. Then again, audio is always a lot nastier than it really should be. Hopefully no-one's too lost, and you found this chapter interesting.

I'd be curious as to whether people would prefer to use SDL_QueueAudio() or our custom ring buffer moving forward. I think I've managed to work around the SDL 2.0.4 issues in the source download, so try both out, have a play, and let me know which is your favourite.

If you've bought Handmade Hero, the source for the Linux version can be downloaded here. A special, extra-compatible copy of SDL which includes SDL_QueueAudio() is included: you can compile the SDL_QueueAudio() version with ./build-queueaudio.sh.


<-- Chapter 7 | Back to Index | Chapter 9 -->