You’re calculating framerate percentiles wrong!


I apologize for the click-bait title. But I have your attention now.

I was surprised I could barely find information about this online; so I figured I’d have to step forward.

If you google about percentile calculation, e.g. 95% percentile, they will tell you to do this:

std::vector<double> mSamples;

void getPercentile95()
{
    sort( mSamples.begin(), mSamples.end() );
    return frametimes[(mSamples.size() * 95u / 100u) - 1u]
}

That is, sort all frame times and return the 95th place. You’re done.

The problem with this approach when it comes to frame times (framerate) is that it measures frames instead of time.

Consider the following extreme example:

  1. Game plays at exactly 60 fps for an hour.
  2. For the next hour; the GPU renders a single frame (it took 3600 seconds to render a frame).

So we end up with the following situation:

  1. Total benchmark lasted 2 hours.
  2. Number of samples is 216,001 = 60 fps * 60 secs * 60 minutes + 1 sample.

What’s the 95th percentile?

  1. If we sort all samples, the 205,200th sample will say 60 fps (16.67 mspf).
  2. However for literally half of the total benchmark time, the framerate was 0.000277778 fps (3,600 mspf).

When it comes to time, it’d be more accurate to say the 95th percentile is 0.000277778 fps, not 60 fps.

If we’d be measuring how long it takes for single frames, then yes, there is a single outlier and the 95th percentile would be 60 fps.

In other words, what we want is to subdivide the 2 hours into percentiles, not the 216,001 frames.

I don’t know if proprietary solutions like RTSS (RivaTunner) are doing it right, but at least I can see that FOSS MangoHud falls victim to this issue (I’m not picking on MangoHud, I love that FOSS can be scrutinized which gives the chance to improve).

Solution

double getPercentile95th()
{
    if( mSamples.empty() )
        return 0.0;
    std::sort( mSamples.begin(), mSamples.end() );

    // Don't just consider the number of samples. Consider the total time taken/spent
    // (which bias towards larger samples). Consider the following extreme example:
    // Game runs at 60 fps for 1 hour. Then the next hour is spent rendering a single frame.
    //
    // The 95-p using just number of samples will say 60 fps / 16.67mspf.
    // However out of 2 hours, the user spent 1 hour at 60 fps and another hour at 0.000277778 fps.
    // The 95-p should be 0.000277778 fps (3600000 mspf), not 60 fps.
    double totalTimeTaken = 0.0;
    for( const double sample : mSamples )
        totalTimeTaken += sample;

    const double accumTimeToLookFor = totalTimeTaken * 0.95;

    totalTimeTaken = 0.0;
    for( const double sample : mSamples )
    {
        if( totalTimeTaken >= accumTimeToLookFor )
            return sample;
        totalTimeTaken += sample;
    }

    return -1.0;  // Should not happen
}

That is, we calculate the 95% of the total time taken and then stop until the sum of all previous samples reach that number.

A side effect of this is that the algorithm bias towards slower frame times. Therefore if you had bad 95% now you’ll have even worse stats.

But this side-effect is desired: A “heavier” sample weights more than a “lighter” sample. Just like stutters can kill the experience. If we do not account time, a slow frame taking 27ms is incorrectly going to “matter less” when benchmarking at 144hz than when benchmarking at 60hz; when in fact it equally matters in both cases.

In practice as long as the standard deviation isn’t too large both algorithms will yield similar results.

However considering time instead of number of samples becomes very important if you’re trying to compare window-less approximations against ground-truth; since ground-truth should be evaluating time.

How much does it matter?

Running the following code simulates running in 3 steps:

  • Step 1: Running at 16.67 mspf 93% of the time, the other 7% has random spikes so the frame lasts between 66.67 mspf (15 fps) and 33.33 mspf.
  • Step 2: Running at 33.33 mspf 93% of the time, the other 7% has random spikes so the frame lasts between 83.33 mspf (12 fps) and 33.33 mspf.
  • Step 3: Same as step 1.


#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <algorithm>
#include <limits>
#include <random>

class FrameStatsGroundTruth
{
	std::vector<double> mSamples;

public:
	void addSample( const double timeSinceLast ) { mSamples.push_back( timeSinceLast ); }

	double getAverage() const
	{
		double avg = 0.0;
		for( const double val : mSamples )
			avg += val;

		avg /= double( mSamples.size() );
		return avg;
	}

	double getPercentile95th()
	{
		if( mSamples.empty() )
			return 0.0;
		std::sort( mSamples.begin(), mSamples.end() );
		return mSamples[( mSamples.size() * 95u / 100u ) - 1u];
	}

	double getPercentile95thTime()
	{
		if( mSamples.empty() )
			return 0.0;
		std::sort( mSamples.begin(), mSamples.end() );

		// Don't just consider the number of samples. Consider the total time taken/spent
		// (which bias towards larger samples). Consider the following extreme example:
		// Game runs at 60 fps for 1 hour. Then the next hour is spent rendering a single frame.
		//
		// The 95-p using just number of samples will say 60 fps / 16.67mspf.
		// However out of 2 hours, the user spent 1 hour at 60 fps and another hour at 0.000277778 fps.
		// The 95-p should be 0.000277778 fps (3600000 mspf), not 60 fps.
		double totalTimeTaken = 0.0;
		for( const double sample : mSamples )
			totalTimeTaken += sample;

		const double accumTimeToLookFor = totalTimeTaken * 0.95;

		totalTimeTaken = 0.0;
		for( const double sample : mSamples )
		{
			if( totalTimeTaken >= accumTimeToLookFor )
				return sample;
			totalTimeTaken += sample;
		}

		return -1.0;  // Should not happen
	}
};

struct Step
{
	double minValue;
	double range;
};

int main( void )
{
	std::mt19937_64 rng;
	// initialize the random number generator with time-dependent seed
	uint64_t timeSeed = 381402079;
	std::seed_seq ss{ uint32_t( timeSeed & 0xffffffff ), uint32_t( timeSeed >> 32 ) };
	rng.seed( ss );
	// initialize a uniform distribution between 0 and 1
	std::uniform_real_distribution<double> unif( 0, 1 );

	FrameStatsGroundTruth groundTruth;

	const Step steps[] = {
		{ 0.0, 1.0 },
		{ 0.0, 2.0 },
		{ 0.0, 1.0 },
	};

	const size_t numSteps = sizeof( steps ) / sizeof( steps[0] );

#define SIMULATION_STEPS 10000u

	for( size_t s = 0u; s < numSteps; ++s )
	{
		for( size_t i = 0u; i < SIMULATION_STEPS; ++i )
		{
			double val = unif( rng );
			if( val < 0.98 )
				val = ( 1.0 / 60.0 ) * steps[s].range + steps[s].minValue;
			else
				val = val * 0.05 + ( 1.0 / 60.0 ) * steps[s].range + steps[s].minValue;
			groundTruth.addSample( val );
		}

		printf(
			"Avg =  %f mspf (%f FPS)\t95th-p(time) =  %f mspf (%f FPS)\t95th-p(frames) =  %f mspf (%f "
			"FPS)\n",
			groundTruth.getAverage() * 1000.0, 1.0 / groundTruth.getAverage(),
			groundTruth.getPercentile95thTime() * 1000.0, 1.0 / groundTruth.getPercentile95thTime(),
			groundTruth.getPercentile95th() * 1000.0, 1.0 / groundTruth.getPercentile95th() );
	}

	return 0;
}

Yields the following output:

Avg =  17.64 mspf (56.67 FPS) 95th-p(time) =  65.95 mspf (15.16 FPS) 95th-p(frames) =  16.66 mspf (60.00 FPS)
Avg =  25.89 mspf (38.61 FPS) 95th-p(time) =  65.72 mspf (15.21 FPS) 95th-p(frames) =  33.33 mspf (30.00 FPS)
Avg =  23.14 mspf (43.20 FPS) 95th-p(time) =  65.85 mspf (15.18 FPS) 95th-p(frames) =  33.33 mspf (30.00 FPS)