Handmade Penguin Chapter 10: SDL_GetPerformanceCounter and RDTSC

This chapter covers roughly the content in the QueryPerformanceCounter and RDTSC part of the Handmade Hero course, under the Linux operating system.

<-- Chapter 9 | Back to Index | Chapter 11 -->

Performance Counters

This is going to be the shortest chapter yet: basically everything you need to know is in the title. While Windows has the QueryPerformanceCounter() and QueryPerformanceFrequency() functions, SDL has the almost identical SDL_GetPerformanceCounter() and SDL_GetPerformanceFrequency() functions.

Let's measure how long it takes us to display each frame. Unlike their Windows counterparts, the SDL functions just return a raw uint64 instead of some crazy LARGE_INTEGER union we didn't care about. They also have a normal return value, instead of needing to be passed a pointer. First, we'll get the performance counter frequency:

uint64 PerfCountFrequency = SDL_GetPerformanceFrequency();
That's so easy!

Now, we'll create a variable to store the time before we processed our frame:

uint64 LastCounter = SDL_GetPerformanceCounter();
We now just need to update the time at the end of our while(Running) loop, and output the difference.
uint64 EndCounter = SDL_GetPerformanceCounter();
uint64 CounterElapsed = EndCounter - LastCounter;

real64 MSPerFrame = (((1000.0f * (real64)CounterElapsed) / (real64)PerCountFrequency));
real64 FPS = (real64)PerfCountFrequency / (real64)CounterElapsed;

printf("%.02f ms/f, %.02ff/s\n");
LastCounter = EndCounter;
Wow: this is all pretty much exactly what we did with Windows, but with all of the little annoying bits taken out!

One thing you may have noticed is that the printf() function we've been calling is basically a combination of sprintf() and OutputDebugStringA(). We can use sprintf() (and snprintf()) on Linux easily, though we'll probably write our own as part of the stream. Technically, the underlying system call for outputing to the screen is write() to the stdout or stderr handles. printf() is good enough for our debugging purposes, but do keep in mind Casey's warnings about it.

RDTSC

To read the number of (nominal) CPU cycles, rather than the physical time, we used the RDTSC instruction. As RDTSC is an actual hardware instruction, it's available on Linux as well. The intrinsic we use to access it is slightly different: it's _rdtsc() with only one underscore. Intel provides the Intel Intrinsics Guide, which lists all of the official intrinsics for their processors, which are used by pretty much every compiler except Visual C++.

We'll use _rdtsc() the same way we did on Windows, which is pretty much exactly the same way we used SDL_GetPerformanceCounter(): reading it after each frame, and taking the difference:

uint64 LastCycleCount = _rdtsc();
and
uint64 EndCycleCount = _rdtsc();
uint64 CyclesElapsed = EndCycleCount - LastCycleCount;
real64 MCPF = ((real64)CyclesElapsed / (1000.0f * 1000.0f));

printf("%.02ms/f, %.02f/s, %.02mc/f\n", MSPerFrame, FPS, MCPF);

LastCycleCount = EndCycleCount;

If we compile and run, we get an error, however. We need to include x86intrin.h:

#include <x86intrin.h>
before we can use it. (If you're using the clang compiler, you will need at least version 3.5.)

Be warned! Some documentation says to include immintrin.h instead. This is the official place where _rdtsc() resides, but gcc hasn't quite caught on yet. On the bright side, x86intrin.h acutally also has a definition for __rdtsc() which is compatible with Visual Studio's compiler.

End of Lesson!

And that's it. Told you this one was trivial! See you next week, when we'll be finishing up the platform layer!

If you've bought Handmade Hero, the source for the Linux version can be downloaded here. A special, extra-compatible copy of SDL which includes SDL_QueueAudio() is included: you can compile the SDL_QueueAudio() version with ./build-queueaudio.sh.


<-- Chapter 9 | Back to Index | Chapter 11 -->