Again, just like my previous post, I’m biased, it’s not strictly scientific benchmark, may not represent real world performance, blah blah blah. You know the drill. Move on.
Now that I’ve done the proper disclaimers. Let’s get to the test.
The test is similar to the one performed in my previous post. However because this time we’re testing the HW basic instancing technique, it’s shader based instead of using the Fixed Function. We’re still using the DX9 API. The cubes are not textured.
One big advantage of this test is that we’re no longer API call bounded nor Render Queue bounded (Render Queue is screaming for refactor, but each thing in its own own time), so we can compare performance even at very high entity count.
Compile params were same as before. Same PC was used for the bench.
We’re also testing a new feature: SceneMemoryMgrTypes. Read the Doxygen documentaion in the link.
Basically Nodes, Entities & InstancedEntities can now be created as SCENE_STATIC or SCENE_DYNAMIC. Nodes & Entities can switch between scene types on the fly (at a performance cost for that frame) while InstancedEntities can’t switch after being created (you need to destroy it and ask for an InstancedEntity of the other type, which is quite cheap actually…)
SCENE_STATIC is useful for objects you know aren’t going to move/rotate/scale (animate?) for a long time (buildings, trees, etc). When changing something that is SCENE_STATIC (like moving or rotating), you need to explicitly inform Ogre via SceneManager::notifyStaticDirty (though there are some functions that do this for you, for example creating and destroying, IIRC)
In other words, it’s StaticGeometry made easier. And when combined with HW Instancing (whether Basic or VTF) it’s usually superior because we get per object cull on the CPU, and lower VRAM usage (StaticGeometry clones the vertices). Not to mention once StaticGeometry is built it’s a PITA to modify.
HW Basic already supported a “static” mode. But due to how well defined stages are in 2.0, performance gains are way more dramatic as we can skip much more calculations than we used to.
I draw a grid of 250×250 boxes, which makes a total of 62.500 Entities on scene.
The screenshots compare static mode on both 1.9 & 2.0. I choose that test because it’s the one that causes the most impressive speed up. Because I’m sensationalistic and stuff (if you want the full results, skip the screenshots and go straight for the table).
On the left Ogre 2.0; on the right Ogre 1.9
This stable summarizes the differences:
|Test – HW Instancing Static||Ogre 2.0||Ogre 1.9||Speedup|
An astonishing difference on every single case. Even when rendering as much as 62.500 cubes, we’re getting incredible performance. If these were city buildings, our city was “completely unplayable” in Ogre 1.9; while Ogre 2.0 achieves the “legendary 60 fps” all the time.
Of course, real world scenarios, city buildings are textured, have more vertices, need Lod (which needs fixing in 2.0 yet) and thus the instance per batch ratio may lower (which impacts API performance), so don’t extrapolate these results to whatever game you have in mind.
An the last one with 15x improvement? pufff!!!. It’s hard to get a “No Entities” state in a real game though, due to batch fragmentation (when InstancedEntities start moving around and become too far each other, causing the batch to have huge aabb bounds). There is a function to defragment though. And defragmentation shouldn’t be an issue for SCENE_STATIC if objects are carefully created in order of proximity.
But it’s very clear that we’re on the right track, considering the best case scenario had terribly bad framerate in 1.9
|Test – HW Instancing, Rotating cubes & SCENE_DYNAMIC||Ogre 2.0||Ogre 1.9||Speedup|
This is exactly the same test in my previous post with the rotating cubes, except we’re using HW instancing. This is waaay too good.
Other tests not shown
There are more variations not shown here. You can try SCENE_STATIC combined with regular Entities instead of HW instancing. It outperforms Ogre 1.9 which doesn’t have the notion of static for those objects, but was causing my previous “Not Animated test” to perform very similarly in 1.9 & 2.0
Static mode is definitely very powerful and your engine should take advantage of it.
Code and reproducing results
Of course, what you need to reproduce my results:
Ok this is all for now. /Signing off
PS. Time to focus on shadow mapping