For Ogre 2.0; I’m introducing the new class ‘Aabb’. For once, it has a shorter name ðŸ™‚

Aside from that, it’s exactly like an AxisAlignedBox, except it has different representation to store it’s data.Â AxisAlignedBox stores the bounds using the minimum and maximum corners of the box.

Aabb instead uses the ‘half size’ approach: Store the half of the dimension size and the center. Both methods take the same amount of RAM:

Aabb | AxisAlignedBox |

class Aabb { Vector3 m_center; Vector3 m_halfSize; }; |
class AxisAlignedBox { Vector3 mMinimum; Vector3 mMaximum; }; |

As it becomes obvious (and as I just said), they’re both need the same amount of RAM.

To get the Maximum one needs to do m_center + m_halfSize, and the Minimum is m_center – m_halfSize

The reverse conversion is also easy, m_center = (mMinimum + mMaximum) * 0.5f; m_halfSize =Â (mMinimum – mMaximum) * 0.5f;

## Why the change?

Arithmetically (instruction wise), the ‘half size’ is much more efficient than the ‘min-max’ approach.

- Frustum culling (aabb vs plane intersection) is much faster. In fact, Plane::getSide() converts theÂ AxisAlignedBox into a half size representation. This math operation is performed
frequently (once per object, every frame!).*very* - Transforming an aabb by a 4×4 Matrix is also much faster (i.e. the final transform). In fact,Â AxisAlignedBox::transformAffine also converts to a half size representation. Again this operation is very frequent.
- Most Intersect() overloads require less tests, which means less branches:

inline bool Aabb::contains( const Vector3 &v ) const { Vector3 distance( m_center - v ); return ( Math::Abs( distance.x ) <= m_halfSize.x ) & ( Math::Abs( distance.y ) <= m_halfSize.y ) & ( Math::Abs( distance.z ) <= m_halfSize.z ); }

bool AxisAlignedBox::contains(const Vector3& v) const { return mMinimum.x <= v.x && v.x <= mMaximum.x && mMinimum.y <= v.y && v.y <= mMaximum.y && mMinimum.z <= v.z && v.z <= mMaximum.z; }

3 subs +

3 abs (a trivial op, builtin into fpu or just one andnot for sse) +

3 comparison instructions +

3 and

= 12 pipelined instructions6 comparison instructions +

6 jumps

= 12 non-pipelined instructionsThe Aabb version needs around the same amount of instructions, but is more pipeline friendly due to fewer or lack of comparison and jump instructions.

The top usages of aabbs in Ogre are the following:

- Frustum Culling: Aabb
- Intersection tests: Aabb
- Merging: AxisAlignedBox is superior, though not by much (needs a few more arithmetic ops)

As a result, Ogre 2.0 will use Aabbs (and there is an ArrayAabb counterpart). Nonetheless there’s a lot of places using the old AxisAlignedBox (this includes Mesh files) implementation, so even though AxisAlignedBox will be deprecated, it’s transition will be slow. I will be using Aabbs for the biggest hotspot right now which is MovableObject’s bounding box in both local & world space; the other places rarely use bounding boxes even if they hold an instance, so the code will just convert to Aabbs on the fly.

Furthermore I didn’t port some math functionality of AxisAlignedBox that isn’t used by us (but may be useful for someone else) hence another reason to keep both classes for the time being.

## Additional changes

At the beginning of this post I didn’t write the full layout from AxisAlignedBox. There are two additional variables:

Extent mExtent; mutable Vector3* mCorners;

The first one is used to distinguish between null bounding boxes, normal ones, and boxes with infinite bounds (ie. enclose everything)

The second one is a handy pointer allocated on demand when calling getAllCorners() and may act as a cache. These of course increase the class 8 bytes in a 32-bit build.

Neither of them is really needed (**specially** mCorners, not to mention it’s a mutable member) so I decided to not include them Aabb.

As for handling Infinite boxes, the math in both ArrayAabb & Aabb gracefully handle the case where m_halfSize is 1.#INF relying on IEEE floating point behavior.

It’s a bit tricky since something as innocent as “left.m_halfSize – right.m_halfSize <= R” is wrong because it will produce Nans if both are Infinity (and hence always false). While “left.m_halfSize <= right.m_halfSize + R” will work as expected, returning true.

Also, multiply m_halfSize by 0 and bam! a Nan appears; which may happen in a plane vs aabb test since we do a dot product against the plane normal.

If it gets too hard to handle infinite Aabbs using arithmetic operations only, I may revert back to using an additional variable.

Although very few objects use infinite aabbs (ie. Directional Lights) Nans are undesirable because they may introduce bugs, slowdown computations, and make harder to search for bugs during development by raising exceptions on Nan.

## Where’s your center, sphere?

Another advantage from this representation is that we already have the center of the mesh (defined as the middle point of it’s aabb) and we can use it for spheres. A widespread problem in Ogre was that often the bounding radius of an object had its center at local origin.

This meant that meshes that were modeled far away from origin could greatly overestimate the bounding radius, as Ogre calculated a bounding sphere with the wrong center. This is exactly what Math::boundingRadiusFromAABB was doing: Take an Aabb, and calculate its radius from origin, rather from its center.

The result is that LOD calculations and some culling algorithms overestimate the size of the object. With Aabbs no longer have that problem.

Additionally, some MovableObjects implementation implemented a full Sphere representation, so if it had an AxisAlignedBounds, it needed to store a Sphere too (which requires 4 floats: 3 for center, 1 for radius). Now, we’re saving 3 floats as the center is already present in the Aabb, and just need the radius.

I’m surprised the sqrt implied with the ‘distance’ function is not killing performance with your new implementation. If you remove the sqrt, you might get better performance, which might be accomplished by testing against something like (2*halfSize)^2 – and cache that for better performance.

Also, I think the old implementation would better use the system cache if you have SoA.

I think preferring Abs over explicit logic tests would buy you no performance improvements.

How do you know the performance is better? Are these changes so critical? I would think an overhaul for 2.0 would benefit from large architectural changes, not the small optimizations like what you are talking about here.

What sqrt?

There are no sqrt in Aabb vs Aabb nor Aabb vs Vector3 tests, implicit nor explicit. If you look closely, note it says Vector3 distance( m_center – v ); not Real distance( sqrt( (m_center – v) dot (m_center – v) ) )

Aka SAT, The distance is measured on each axis independently.

Aabb::distance( Vector3 ) hasn’t it either but needs to be reviewed as it is suffering the same flaw as the original implementation

And yes, ArrayAabb::transformAffine is a critical hotspot. And Frustum Culling is also benefiting from this change, and has been optimized so much to the point frustum culling is no longer a hot spot.

And btw… I AM making large architectural changes, I’m just commenting on one of the tiny bits that could need explanations, since it’s like “a basic primitive”.

Strangely, Havok still uses the old min/max representation for AABB (hkAabb).

Is it a backward compatibility problem or they have some very specific optimizations for that?

Not really (and yes, backwards compatibility). Their hkAabb is separate from the ones they use in their algorithms.

The actual algorithm uses the data from the newer shapes hkpBoxShape/hkpConvexVerticesShape/hkpExtendedMeshShape; which uses a center/half_size representation instead of a min/max.

Sometimes they go even further, since hkpBoxShape assumes the center is at the local origin; thus it only contains the half extents; and the center is the entity’s; saving a bit of RAM.

If the AABB is not at the origin, you can wrap the box shape in the slightly more expensive hkpConvexTranslateShape

I decided not to go to that extreme since a mesh rarely has its center at the local origin. In the end turned out to be a good call, since some visibility testing (i.e. light vs entity, upper distance, etc) use the world aabb’s center instead of the SceneNode’s position; which is more consistent. Normally this inconsistency in Ogre 1.9 is rarely ever noticed, because the aabb often has a center close enough to be aabb (or the calculation just over estimate the size) and the glitches would still be hard to notice (i.e. lights A,B,C,D are sent to the shader instead of sending lights A, B, C, E)

With SSE I find the Min/Max approach much more effective :

inline bool Excludes(const __m128 &vertex) const

{

return (_mm_movemask_ps(_mm_or_ps(_mm_cmpgt_ps(vertex, max), _mm_cmplt_ps(vertex, min))) & 0x7) != 0;

}