Samsung Galaxy A54 Gaming Performance: WHAT THE HECK IS GOING ON?


Disclaimer: If you work at Samsung Developer Relationships and want to ping me, send me an email to info@yosoygames.com.ar

So I bought a new Samsung Galaxy A54 (SM – A546E). On paper it was similar to the one I had (a Xiaomi POCO F2 Pro). I later learnt it’s quite a downgrade in practice.

But for regular every day use (Twitter, Browsing, Youtube) it is not much different (as long as it feels near-instantanous it gets my approval) and the 120hz screen is an upgrade. The speakers actually sound great. I’ve never heard a phone sound this good (and I’ve tested MANY phones).

And one reason I bought it was that, just like the POCO F2 Pro, both phones are HMP (Heterogenous Multi-Processing aka big.LITTLE)

But one thing that stood was: Its poor gaming performance.

The first thing I tried was testing our game. Given that it can run on something as low as a Redmi 4X w/ Adreno 505 at 50 fps (Graphics on lowest), I was apalled to see our game ran as low as 20 fps when the finger was off, and 110 fps when the finger was on.

Given that the Redmi 4X has a Geekbench Score of SC 201 / MC 879 it should obviously run at full speed on a phone with a Geekbench of SC 1009 / MC 2842.

Multithreaded Game Engine

Our game engine is custom made, based on OgreNext. We proud ourselves that it uses all available cores.

That is how we managed to get 50 fps on the Redmi 4X, a framerate that we can’t reach with just one or two cores. We use all 8 cores that phone has to offer and have room to spare (it’s GPU bound now) thus thermals aren’t an issue; and we make sure to sleep a lot when we can.

On the A54 we are spawning 8 threads and assuming they all perform equally, which is a wrong and bad assumption. However given the POCO F2 Pro’s performance (which is also HMP) I didn’t think it would make a difference. Note: We actually spawn more than 8 threads, however these are all doing menial tasks and some of them aren’t even ours (e.g. driver/Android threads). The big work is concentrated on those 8 threads.

If I only spawn 2 threads (+ the menial threads), the game runs at the best performance so far.

But even then, the performance is very noisy. Sometimes I launch the game and I get 120 fps; I close the app, launch it again and I get 82 fps. And if I leave the game open long enough I will see all sort of framerates throughout its lifetime. However this is harder to troubleshoot possibly due to thermals. But what I’m talking is about the first 10-20 seconds since launch. Thermals aren’t the problem.

However what I found has no other explanation other than a terribly broken thread scheduler at the kernel level.

I don’t know if this is the stock Linux scheduler, or if Samsung has tweaked it, but I strongly suspect the latter. According to CPU-Z, it’s using the energy_aware governor.

It’s not thermals

This problem happens on a cool (30°C) phone as soon as I start the game. I’m evaluating the first 10-20 seconds of the game.

The Problem Persists with Game Booster “Labs”

Samsung has a Game Booster service that has been critiziced for throttling game performance; to the point they released a patch where one can go to the Labs section and switch a toggle that uses “an alternative performance management that can cause overheating”.

I tricked Game Booster into recognizing our game (which is not published yet on the Play Store) as a game by using an applicationId from another popular game.

Toggling this setting seems to be pushing the frequency more aggressively which helps with the framerate stay up for longer, but the underlying problem is still there.

The measurements described in this post were taken without this setting.

Perfetto

At Alec Miller’s suggestion, I tried Pefetto. An Android Profiler that works at system level. What I found shocked me. I ran 8 tests: 8 threads (finger off screen, finger on screen), 4 threads (finger off + on), and 2 threads (finger off + on).

Finger Off, 8 threads25 fps
Phone is in low power
4 Little Cores are usedFreq: 864khz
2 Big cores are usedFreq: 533khz
Workload is VERY unevenly distributed
Thread scheduling for Little cores is ATROCIOUS
1 little core ends up waiting on 4 threads scheduled on another little core
2 Big cores are mostly waiting on little cores
The other 2 Big cores not used at all
The game should be perfectly capable of reaching 60fps w/ 4 little cores at 864khz.

What stumbled me is that although all 4 cores are used, the thread scheduling is all over the place. I would see one little core running just one thread waiting for other 5 threads that were scheduled on another little core. If the kernel were evenly distributing the work (2 threads per little core) the game would be running fine.

Instead I get this:

but this meme usually is for single threaded apps. That’s not the case here. It’s because the kernel decided to schedule a lot of worker threads to the same CPU and have the other little ones wait for the poor little overworked one.

Finger On, 8 threads90 – 110 fps
Phone is in high power
4 Little cores are rarely usedFreq: 1440khz
4 Big cores are usedFreq: 2016khz
Even distribution of workload
If you keep your finger on at all times, you get the best performance with 8 threads. But it’s no wonder: the phone gets into the higest power state (2Ghz!), puts all the work into the Big Cores, and we get an evenly distribution of 2 worker threads per core. Finally!

It’s too bad the phone has to go into high power (until it thermally throttles and framerate becomes weird again) and the user needs to keep finger on at all times. This is worthless.

Finger Off, 2 threads95 fps
Phone is in mid power
Threads keep jumping between all 8 cores
Little cores are less used than w/ 8TFreq: 1344khz
2 Big cores are used, but work keeps jumping on all 4Freq: 960khz
Inconsistent performance
Not much to say. The work scheduling is jumpy. It’s no wonder since it has a lot of cores and few threads to use.

But the framerate can be inconsistent, at times locking at lower framerates (e.g. like 30 fps or 50 fps) and then going back to 95 fps. We’re talking about a static scenario with no thermal throttling. Actual gameplay gets wild.

Finger On, 2 threads95 fps
Phone is in mid power
4 Little cores are rarely usedFreq: 1152khz
2 Big Cores used a lot. Work keeps jumpingFreq: 1056khz
Inconsistent performance
There’s not much to add here. The framerate is more consistent than with finger off, but still a shitshow. But given that this is the best we’ve managed so far, we’ll probably ship with this configuration for the A54.

Finger Off, 4 threads30-50 fps (oscilation)
Phone is in mid/low power
3 Little cores are usedFreq: 1056khz
1 Big core is used (jumps between 2)Freq: 533khz
EXTREMELY inconsistent performance
WTF is this??? The game LITERALLY oscillates between 30 fps and 50 fps. But I’m not saying frame t takes 33ms and frame t+1 takes 20ms. I wish!. No, the game starts running at 30fps for 4 seconds, and then moves onto 50 fps for 4 seconds, then 45 fps for 2 seconds, then 50, then 30. It’s completely random.

There is 4 little cores + 4 big cores, 4 threads. You’d think all threads would be scheduled in the same set of cores. But no, 1 thread is allocated in the big cores but running 533khz (so, in lower power) and jumping between 2 big cores, while 3 threads are kept in little cores.

It just makes no sense.

Finger On, 4 threads82 fps
Power state keeps jumping
1 Little cores are rarely used, but their Freq keeps jumpingFreq: 1248 – 1632khz
3 Big cores are used. Very bad schedulingFreq: 1344 – 1632khz
Inconsistent performance
WTF!?!?!? You’d think that if we are able to reach as much 110 fps w/ 8 threads with the finger on (with all 8 threads concentrated on 4 big cores), and all previous experiments when the finger is on are around 90-95 fps; then 4 threads would obviously run smoothly.

BUT NO, WE GET 82 FPS THIS TIME. AND WE USE 3 BIG CORES, NOT 4. And the little cores are not used, but for some reason their frequency gets bumped into high power.

It’s not just my game

I’ve been playing Honkai Impact (a fast action game) and I can definitely feel (there is no FPS counter) the same problems our game is having: Even at the lowest graphical settings (but 60 fps gameplay) the game keeps alternating between slugish 30 fps and smooth animation (that you typically see in +60 fps display).

The HW should definitely be able to keep to sustained 60 fps sustained at lowest settings. Much weaker phones can.

Something must be broken in the Kernel

This is the only explanation I can come up with.

I suspect Samsung Engineers must be ditching the reports as “thermal throttling, nothing we can do”. “The little cores are weak and sometimes we put the threads there. Nothing we can do”. And since most Android games are single-threaded (or at most, dual-threaded) it would look that way.

But a closer inspection suggest the scheduler is doing things incorrectly. If you’ve got 8 threads & 4 little cores, don’t put 4-5 threads on the same core so that the other 3 cores wait for that overworked core. Allocate 2 threads per core.

On the POCO F2 (it’ has’s got 1 big core + 3 mid cores + 4 little cores), if I use 4 Threads, 1 thread goes in the Big, the other 3 into the mid cores. Even distribution. Solid 60 fps. The same happens if I use 8 threads (little cores are almost never used).

This what I’d expect from a fullscreen Vulkan game (get the bigger cores, not the small ones). It is also what I’d expect from the scheduler: If you’re going to put me on the little cores (which our game surely can do, it’s not that heavy) at least distribute the load fairly.

This could be something in the Linux kernel on how work is allocated on the little cores (after all, the POCO F2 doesn’t really put the game on the little cores; maybe the bug is present there too but it doesn’t manifest).

Observing the behavior of my tests on Samsung, the load has been the most distributed more fairly when big cores are chosen (except when I tried 4T finger on, which is odd); and it’s a disaster when the little ones get involved.

I feel this phone is just one bugfix away from unlocking its true potential.

Maybe it is our fault; but I’m getting more and more convinced it’s not. The evidence suggests something’s wrong w/ the scheduling.

As I said at the beginning, if you work at Samsung Developer Relations ping me at info@yosoygames.com.ar and we can provide multiple build that repros this problem (e.g. one build per thread configuration).