HW instancing & static mode benchmark on 2.0 3


Again, just like my previous post, I’m biased, it’s not strictly scientific benchmark, may not represent real world performance, blah blah blah. You know the drill. Move on.

Now that I’ve done the proper disclaimers. Let’s get to the test.

The test

The test is similar to the one performed in my previous post. However because this time we’re testing the HW basic instancing technique, it’s shader based instead of using the Fixed Function. We’re still using the DX9 API. The cubes are not textured.

One big advantage of this test is that we’re no longer API call bounded nor Render Queue bounded (Render Queue is screaming for refactor, but each thing in its own own time), so we can compare performance even at very high entity count.

Compile params were same as before. Same PC was used for the bench.

New Feature!

We’re also testing a new feature: SceneMemoryMgrTypes. Read the Doxygen documentaion in the link.

Basically Nodes, Entities & InstancedEntities can now be created as SCENE_STATIC or SCENE_DYNAMIC. Nodes & Entities can switch between scene types on the fly (at a performance cost for that frame) while InstancedEntities can’t switch after being created (you need to destroy it and ask for an InstancedEntity of the other type, which is quite cheap actually…)

SCENE_STATIC is useful for objects you know aren’t going to move/rotate/scale (animate?) for a long time (buildings, trees, etc). When changing something that is SCENE_STATIC (like moving or rotating), you need to explicitly inform Ogre via SceneManager::notifyStaticDirty (though there are some functions that do this for you, for example creating and destroying, IIRC)

In other words, it’s StaticGeometry made easier. And when combined with HW Instancing (whether Basic or VTF) it’s usually superior because we get per object cull on the CPU, and lower VRAM usage (StaticGeometry clones the vertices). Not to mention once StaticGeometry is built it’s a PITA to modify.

HW Basic already supported a “static” mode. But due to how well defined stages are in 2.0, performance gains are way more dramatic as we can skip much more calculations than we used to.

I draw a grid of 250×250 boxes, which makes a total of 62.500 Entities on scene.

The Results

The screenshots compare static mode on both 1.9 & 2.0. I choose that test because it’s the one that causes the most impressive speed up. Because I’m sensationalistic and stuff (if you want the full results, skip the screenshots and go straight for the table).

On the left Ogre 2.0; on the right Ogre 1.9

Ogre 2.0

Ogre 2.0 Viewing Full Scene (static)

Ogre 1.9 Viewing Full Scene (static)

Ogre 1.9 Viewing Full Scene (static)

Ogre 2.0 Partial Scene (static)

Ogre 2.0 Partial Scene (static)

Ogre 1.9 Partial Scene (static)

Ogre 1.9 Partial Scene (static)

Ogre 2.0 Looking away, no batch rendered (static)

Ogre 2.0 Looking away, no batch rendered (static)

Ogre 1.9 Looking away, no batch rendered (static)

Ogre 1.9 Looking away, no batch rendered (static)

This stable summarizes the differences:

Test – HW Instancing Static Ogre 2.0 Ogre 1.9 Speedup
All entities 14.68ms 54.05ms 3.68x
Some entities 4.62ms 29.17ms 6.31x
No entities 0.56ms 8.48ms 15.14x

An astonishing difference on every single case. Even when rendering as much as 62.500 cubes, we’re getting incredible performance. If these were city buildings, our city was “completely unplayable” in Ogre 1.9; while Ogre 2.0 achieves the “legendary 60 fps” all the time.

Of course, real world scenarios, city buildings are textured, have more vertices, need Lod (which needs fixing in 2.0 yet) and thus the instance per batch ratio may lower (which impacts API performance), so don’t extrapolate these results to whatever game you have in mind.

An the last one with 15x improvement? pufff!!!. It’s hard to get a “No Entities” state in a real game though, due to batch fragmentation (when InstancedEntities start moving around and become too far each other, causing the batch to have huge aabb bounds). There is a function to defragment though. And defragmentation shouldn’t be an issue for SCENE_STATIC if objects are carefully created in order of proximity.

But it’s very clear that we’re on the right track, considering the best case scenario had terribly bad framerate in 1.9

Another test:

Test – HW Instancing, Rotating cubes & SCENE_DYNAMIC Ogre 2.0 Ogre 1.9 Speedup
Few entities 19.54ms 107.3ms 5.49x
Some entities 24.48ms 126.25ms 5.16x
All entities 33.16ms 153.43ms 4.63x
No entities 19.17ms 105.6ms 5.51x

This is exactly the same test in my previous post with the rotating cubes, except we’re using HW instancing. This is waaay too good.

Other tests not shown

There are more variations not shown here. You can try SCENE_STATIC combined with regular Entities instead of HW instancing. It outperforms Ogre 1.9 which doesn’t have the notion of static for those objects, but was causing my previous “Not Animated test” to perform very similarly in 1.9 & 2.0

Static mode is definitely very powerful and your engine should take advantage of it.

Code and reproducing results

Of course, what you need to reproduce my results:

Ok this is all for now. /Signing off

PS. Time to focus on shadow mapping


3 thoughts on “HW instancing & static mode benchmark on 2.0

Comments are closed.