On Mike Acton’s review of OgreNode.cpp 16


You may have seen on Mike Acton’s twitter account that he posted a review code on OgreNode.cpp.

Seems like some community members are waiting for my response, I made a few comments on bounceapp but due to my inexperience with that site I didn’t realize changes had to be saved. So here goes my reply:

Mike Acton is a respected programmer in the video game industry, and he’s right. In fact, if you were paying attention I listed his famous Typicall C++ Bullshit as reference in my Ogre 2.0 proposal.

OgreNode.cpp was written 13 years ago when OO programming was all the rave (still is?) everyone had a single core, caches didn’t matter and most efficient way to cull the world was to use an Octree or a BSP. The world believed that “if( dirty )” was a magical, no-cost expression that is immediately a performance improvement wherever used to avoid the execution of more than 3 instructions.

13 years later, Moore’s law kicked us in the butt and everyone is multicore. You probably know that story already.

Mike Acton reviewed the 1.9 version. Perhaps it would’ve been more interesting to see a review of the 2.0 file which has been refactored to better fit Data Oriented Design principles (and I’m sure there are things I wrote to criticize). Many of the things he criticizes of 1.9 have been fixed. Nevertheless there are things we can learn. Note that if he weren’t right, then it would be hard to explain why there was a 5x performance increase between 1.9 and 2.0.

We used to trash the data cache bad, very bad. There may be room for further improvements, but I’m now focusing on getting a decent release to package*. And I hope he doesn’t review OgreAnimation.cpp from 1.9; the whole file should be censored from the internet (a replacement is in the works).

*And besides, there are big hotspots waiting to be fixed like skeletal animation and the render queue.

What is fixed in 2.0

  • Most of the memory sensitive data is now taken care by a memory manager to ensure they’re friendly to the cache, and to allow several SIMD and multi threading approaches (you couldn’t even update the hierachy or set positions from multiple cores in 1.9!)
  • The constructor that accepts a string (the node’s name) has been removed. The name is empty by default (in 1.9 it had to be unique and that caused the name generator crap).
  • Update has no branches anymore but we still use the two “inherit orientation/scale” flags for features and flexibility. However this is branchless (though perhaps the compiled C version may contain a few branches; we now use SSE2/NEON when available and by default, to implement branchless selection).
  • The “queued for update” BS has been removed. There is no need for this anymore. My WTF meter exploded the first time I took a look at it.
  • All the “if( mCachedTransformOutOfDate )” code and variable BS has been removed (except in debug mode for a few assertions). We now update the hierarchy only once per frame (or on demand), and use the cached results (since THAT’s the general case). For convenience, users may use an alternate function that forces the recalculation of the derived transform for that single node (but the user has to know it is needed because it may be out of date; that’s why we placed asserts in debug mode); though this function is discouraged by the docs that it shouldn’t be used frequently on many nodes (the user should refactor your code so that it happens after the hierarchy has been updated, or perform another update on-demand).
  • The way children and parents are recalculated has been completely rewritten. Like Mike Acton says, how the transform will actually be computed was hidden because every function can be overridden. This math in the end was equal to all derived classes (including bones), and thus many virtual keywords have been removed, and the main hierarchy update function is now static. Can’t get clearer and more transparent than that.
  • The atrocities he mentions of the various getChild and removeChild overloads have been fixed (i.e. string lookup, O(N) iteration when the integer handle is provided, etc) and many redundant overloads have also been removed. We store all children in an plain old array, where it used to be stored in a std::map (omg!).
  • We’re in the process of removing all uses of getSquaredViewDepth calls and though it still remains for legacy reasons, it has been wiped from the major hotspots.
  • needUpdate has been removed.

What we can learn from 1.9 review and/or still applies

  • I agree that mInitialPosition/Orientation/Scale don’t belong to Node at all. They’re used by animated nodes to remember their starting position (i.e. when subclassed by bones, they contain the binding pose). Awful. We’re still working on animation, but these three variables will eventually go away.
  • At first I disagreed, but the more I look at setParent, the more I think he’s right about it. Note however, it’s a protected member function and the doc says it’s for internal use. This is the reason it’s not defined what happens when parenting to a node that already has a parent: This behavior is well defined in the public interfaces addChild/removeChild (throw an exception).
  • He’s got quite a point in that we provide createChild functionality, but no method to mass-create children. There’s also no method for deferred creation of nodes so that they can be made from multiple threads. I haven’t even thought of that (currently creation, destroying, attaching and detaching of nodes must be done from one thread)
  • I didn’t notice how inefficiently we set the initial position and orientation in createChild.
  • I agree on the DebugRenderable thing from nodes. It doesn’t belong there and only increases compilation time due to additional dependencies. I think it’s probably broken in 2.0 though I haven’t tried.
  • Some of listener’s bad practices stuff was removed, but it’s still around.

 

All in all, we thank Mike for taking the time to review our code; we welcome these harsh reviews. It’s been useful in identifying quite a bunch of issues.


16 thoughts on “On Mike Acton’s review of OgreNode.cpp

  • Lunkhound

    Nice write up!
    Will the new Bone class derive from Node as it does in 1.9? It just seems like the functionality of Node is a bit overkill for what is needed for Bone. What is the point of having the inheritScale/inheritOrientation flags on the bones? Isn’t that just going to slow down and complicate the updating of the bone transforms?

    • Matias Post author

      Actually, inherit scale and orientation flags are most used by artists and is tweakable through modelling editors. Probably, these features are most useful for skeleton animation.

      Turning on/off Inherit scale is very used for comic-style animations (think the Incredibles, Despicable Me, etc). Scaling a particular bone can exaggerate a localized expression (i.e. in the face, or an arm “muscle”), but without this scale also affecting the next bones in the chain.

      Turning on/off inherit orientation is more important for mechanical/non-organic animation (think of trains, gears, or movies like Transformers, Pacific Rim)

  • Lunkhound

    Granted, the flags are useful for an artist using an animation-authoring tool, but by the time the animation is exported, the effects of the inherit scale/rotation flags are simply baked into the animation right?

    • Matias Post author

      Not quite.

      Even though they can be baked, they can have terrible artifacts unless the sampling frequency is too high (plus slerp vs nlerp differences for rotation) or they may not play very nice with additive animation (i.e. you want to lower the hand’s weight from all other playing animations, so that you can add “hand wave” on top of all, but you can’t or at least manually have to take into account all the reverse scaling & orientation that this hand should have in order to place it where it really belongs).

      For the most basic of animation systems, yes it’s unnecessary. But when you try to do something more beyond the basics, you realize you need them.

      Besides, with branchless conditional move, the current code is more bandwidth bound than instruction bound; so someone would have to go great lengths to prove me that it is actually impacting performance.

  • Lunkhound

    Well one thing I’ve learned from working with animation in games is that you pretty much never want to use a boolean flag for controlling something like this. If the character is visible on screen when you toggle the state of one of these flags, the character’s pose will change instantly, creating an ugly-looking “pop” in the character’s motion. You always want to smooth out such transitions using some type of weighted blend. Given that, I see very very few use cases for these flags. And the effects of these flags can be easily achieved by manually adjusting the bones post-animation (same as for other procedural animations like head-tracking or IK).

    It’s good that the flags aren’t costing any performance on the systems you’ve tested. I hope that’s true for all platforms.

  • Matias Post author

    Yeah, the boolean flags aren’t meant to change on the fly (although you can without effort) due to those popups, but rather to stay as such on/off for the lifetime of that node/bone.

    You keep thinking of “character’s motion”, and like I said, these flags are most useful in comic style animation (which does include characters) and mechanical animation.
    If you keep thinking of realistic or semi-realistic characters, you will hardly find any use for these flags.

  • Lunkhound

    I have worked on a few games with a “cartoony” style (i.e. lots of squash and stretch in the animations), and there was no need for these types of flags.

    I’ve just realized something, though, and now I’m starting to see why these flags may have been added. Just to test things out, I tried adding a “breathing” animation to a creature I’ve got in Blender. The animation is just a simple scaling in the x and z axes of the chest bone. In Blender, the limbs are set to not inherit scaling from the chest. When I played the animation in Ogre, however, the limbs were picking up some weird scaling.

    It was quickly apparent what the problem was: Ogre handles nonuniform scaling differently than Blender (also differently from MAX and Maya, and from every engine I’ve used). In Ogre, scale is a Vector3 which only allows scaling along a bone’s local axes, but it would need to be a 3×3 matrix to handle the arbitrary scaling that Blender allows.

    I see 3 ways to fix this issue:
    1. use the bone flags to eliminate the scaling on the limbs (this is just a workaround though)
    2. modify the exporter to work better with Ogre’s limited scaling capabilities
    3. fix Ogre to support arbitrary nonuniform scaling (big job)

    Without fixing Ogre, the kinds of “squash and stretch” animations commonly used in cartoon-style games, just can’t work.

    • Matias Post author

      IIRC the skeleton file doesn’t save inherit orientation/scale settings (I may be wrong, but otherwise I think that’s a huge flaw IMO, but easy to fix) and it was somewhere mentioned on Blender’s 2.49 exporter docs. And yeah, “breathing” animations without messing up other parts are one big motivation behind these flags.

      Second, if the problem isn’t just an inheritance problem but rather a more advanced one, Point 2 is the right way to go.
      Point 3 implies saving a lot more information about the bones and could cripple real time playback of animations (unless it’s for a few models in scene of course); it’s not within our plans to support such method.

      I find your case a bit weird though, Blender’s internal representations tend to match Ogre’s very closely. Perhaps something else is going on.

  • Lunkhound

    The problem is really in Ogre’s animation system and in the way the Node represents scale. Ogre’s Node cannot represent an arbitrary 3 by 4 matrix transform. In particular, Ogre can’t do non-orthogonal axes, while Blender (and pretty much all industry-standard 3d authoring tools) can.

    In practice, this means that if an animator creates a blender animation with non-uniform scaling (with scaling inheritance), it won’t work right in Ogre.
    For example, say we’d like to apply a scale to the root bone to stretch a character to twice normal height. In Blender, the result will look like the character is made of rubber and has been uniformly stretched vertically. In Ogre, each bone will have the stretch applied in it’s local space, if the bones are rotated differently (they will be) the result will depend on how the character was rigged–but in any case the effect will be ruined because the bones aren’t getting stretched in the same (world space) direction. And the difference isn’t subtle!

    Here’s how Ogre 1.x does scale inheritance:
    // Scale own position by parent scale, NB just combine
    // as equivalent axes, no shearing
    mDerivedScale = parentScale * mScale;

    In order to fix it, it would need to do something like this (assume all are 3×3 matrices):
    mDerivedScaleMatrix = parentScaleMatrix * inverse(mOrientation) * mScaleMatrix;

    Basically one needs to rotate the parent scale into the local space of the bone before concatenating with the bone’s local scale. I may have messed up the matrix order, but it’s something like this.

    There’s an open JIRA ticket on this issue:
    https://ogre3d.atlassian.net/browse/OGRE-241

    I was able to fix the exporter to work around my “breathing” example. The exporter checks to see if a bone is set not to inherit scale in Blender and if so, computes the scale needed to counteract the scale Ogre will come up with. And you are correct–the skeleton file doesn’t export the state of those flags.

    I’m not sure why fixing this properly would “cripple” real time playback of animations. I’ve seen it done on PS2 engines that could handle crowds of characters. Perhaps individual animations could be flagged as to whether they have scale in them, and if no scale is being done, a different code path could be used to blend them together.

    • Matias Post author

      Oh, I see what you mean. I was thinking of something completely different. No, what you say would not cripple performance.
      In fact Ogre 2.0 has spare ALU / Bandwidth ratio to apply more math for non-uniform scales for free.

      In fact, I believe we may not even need need a 3×3 matrix to do the math.

      The following code changes (to 1.9) should do what you want:

      Apply this to Node::updateFromParentImpl

      // Update scale
      const Vector3& parentScale = mParent->_getDerivedOrientation() * mParent->_getDerivedScale() *
      							(mParent->_getDerivedOrientation() * Vector3::UNIT_SCALE);
      if (mInheritScale)
      {
      	const Vector3 correc = mDerivedOrientation * Vector3::UNIT_SCALE;
          // Scale own position by parent scale, NB just combine
          // as equivalent axes, no shearing
      	mDerivedScale = (mDerivedOrientation * parentScale) * correc * mScale;
      }
      else
      {
          // No inheritance
          mDerivedScale = mScale;
      }
      
      // Change position vector based on parent's orientation & scale
      mDerivedPosition = parentScale * (parentOrientation * mPosition);
      //mDerivedPosition = parentOrientation * (mParent->_getDerivedScale() * mPosition);
      
      // Add altered position vector to parents
      mDerivedPosition += mParent->_getDerivedPosition();

      And this to Bone::_getOffsetTransform:

      // Combine scale with binding pose inverse scale,
      // NB just combine as equivalent axes, no shearing
      Vector3 locScale = _getDerivedOrientation() * (_getDerivedScale() * mBindDerivedInverseScale) * (_getDerivedOrientation() * Vector3::UNIT_SCALE);
      
      // Combine orientation with binding pose inverse orientation
      Quaternion locRotate = _getDerivedOrientation() * mBindDerivedInverseOrientation;
      
      // Combine position with binding pose inverse position,
      // Note that translation is relative to scale & rotation,
      // so first reverse transform original derived position to
      // binding pose bone space, and then transform to current
      // derived bone space.
      Vector3 locTranslate = _getDerivedPosition() + locRotate * (locScale * mBindDerivedInversePosition);
      
      m.makeTransform(locTranslate, locScale, locRotate);

      I use “_getDerivedOrientation() * Vector3::UNIT_SCALE” which should avoid the need for 3×3 matrices while still handling negative scales (Haven’t tested thoroughly though). But I don’t know if that’s faster/slower.

  • Matias Post author

    By the way, I like the old behavior, but I definitely agree that if all major authoring tools work the other way, Ogre should follow the same conventions (I can change the behavior in 2.0; but the rest of the team and community has to be consulted if this is for 1.9/1.10)

  • Lunkhound

    Oh by the way, I posted about this yesterday in a forum thread that was started earlier this year:

    http://www.ogre3d.org/forums/viewtopic.php?f=4&t=78633

    I posted some screenshots to illustrate the problem with how nonuniform scaling works. Pictures are so much better than words for this!
    I’m really glad you are open to fixing this for 2.0. Really great news!

    I do think 3×3 matrices will be needed however. A single 3-vector just can’t describe the full range of shearing and skewing (as well as scaling) that results from concatenating a series of bone rotations and scales. And it’s pretty clear that even with your changes, _getOffsetTransform won’t ever produce a matrix with skewed (non-orthogonal) axes, and therefore it can’t properly reproduce the matrices from Blender.

    I shall try out your suggested changes, and report back. I’m curious to see if it is an improvement over the current state of affairs.

  • Lunkhound

    I tried those changes on 1.9 and not only did it not help with the scaling issue, it caused ALL bones to be scaled in weird ways. Even those that aren’t scaled at all! I think I can see why:

    in Bone::_getOffsetTransform, you compute locScale like this:
    Vector3 locScale = _getDerivedOrientation() * (_getDerivedScale() * mBindDerivedInverseScale) * (_getDerivedOrientation() * Vector3::UNIT_SCALE);

    Now for a bone that isn’t scaled, locScale should always come out to (1,1,1), but this calculation will cause locScale to depend on the orientation. I think it doesn’t really make sense mathematically to apply a rotation to a scale vector. At least, the effect won’t be the same as multiplying a rotation matrix with a scale matrix.

    I took a screenshot of the effect if you’d like to see it.

Comments are closed.