Dealing with persistent mapping, cross-platform 5


OpenGL’s persistent mapping comes in two flavours: Coherent and Incoherent.

Coherent mapping puts more burden on the developer as you have to track whether the region you’ve already written to has already been consumed by the GPU or is still in use.
Incoherent mapping takes that burden away from you, and you only have to call glFlushMappedBufferRange or issues memory barries. I suspect this is slower than Coherent mapping, but it’s safer and easier to use.

And what about RenderSystems (like DX11) that don’t support persistent mapping? And what about not-so-older GL implementations that don’t support persistent mapping?

Well, OGRE is approaching this problem in the following way:
Buffers have a map & unmap function, however the declarations are as follows:

enum MappingState
{
    MAPPED
    PERSISTENT_INCOHERENT,
    PERSISTENT_COHERENT
};

class Buffer
{
    void* map( MappingState mapMode, size_t offset, size_t bytes );
    void unmap( bool unmapPersistent );
};

Now if we want to persistently map, we do this:

void *data = buffer->map( PERSISTENT_COHERENT, offset, size );
/* .. work with data ..*/
buffer->unmap( false );

Because we just called unmap( false ); internally the buffer won’t be unmapped. The next call to map( PERSISTENT_COHERENT, offset, size ), it will just do nothing and return the pointer that is already mapped.
To truly unmap it, call buffer->unmap( true );
Calling map again while it is persistently mapped with a different MappingState value is an error. Calling it with an offset and or size that goes out of bounds from the region defined by the first call to map is also an error.

This pattern gives us a lot of advantages:

  • On GL4 with ARB_buffer_storage, it works as described.
  • On DX11/GL3, map/unmap is just a regular map. It will be slower of course, but the code will be dealt transparently with no porting effort. Ogre also tracks the persistent flags usage so that it tells you of the errors that would normally only apply to GL4 (i.e. calling map() again to an already mapped object with offset and size going out of bounds of the original map)
  • When using PERSISTENT_INCOHERENT, whether you call unmap( false ) or unmap( true ), we perform a glFlushMappedBufferRange. Thanks to your unmap call, we got a hint on when to call it.
  • If you suspect a bug like a write-after-read hazard, just flip a switch that will treat all PERSISTENT_COHERENT as PERSISTENT_INCOHERENT maps. If the bug is really bad, all PERSISTENT_* maps can be treated as regular maps, and see if the problem persists (pun intended).

This is a rather short post. I wanted to keep an update on the development of the Ogre 2.0 Final refactor.

Update: I’ve been pointed out that non-coherent mapping does NOT perform implicit synchronization. The difference between coherent & non-coherent mapping has to do with cache-invalidation and memory models (i.e. UMA architectures) and not with the form of synchronization. Thanks Graham Sellers!


Leave a comment

Your email address will not be published. Required fields are marked *

5 thoughts on “Dealing with persistent mapping, cross-platform

  • Christopher Sosa

    Maybe this will be some off-topic. but some reason you’re working in the OpenGL AZDO (Approaching Zero Driver Overhead) for Ogre 2.0? if not, may when Ogre 2.0 will be released as CTP or Stable may write one based in the GL+3 renderer.

  • Matias Post author

    Umm… autocorrect problems? I didn’t really understand.

    So, I’ll try to make a broad reply.
    I’m slowly working on AZDO OpenGL for 2.0
    My original plan was to refactor RenderSystem::_render (as part of the RenderQueue refactor, which interacts with the HLMS). The original _render function is extremely slow (lots of redundant operations, rebinding of everything, etc all of this made PER ENTITY).

    But my vague understanding of VAOs was wrong and turned out refactoring _render required refactoring more parts of Ogre.
    More info is here: http://www.ogre3d.org/forums/viewtopic.php?f=25&t=81060

    The AZDO improvements can be split in two categories:
    1. Improvements that also apply to GLES (mobile) and GL3 too. For example GL3 can read index data from VBOs; which makes “one VAO per vertex format approach” appealing to GL3 hardware as well. You just need an up to date driver.
    GLES doesn’t have this, but still benefits from having large VBO pools (much less bindings overall)

    2. Improvements that are GL4 specific. For example Persistent Mapping. I’m also writting these, but the system decides at runtime to use GL3 path or GL4 path, which are almost equal anyway.
    This part needed large refactor anyway; because glMap* family of functions just stall very aggressively. It’s just better to have a pool of buffers and do the tracking yourself. I’ve found many drivers just ignore the discard flag (unlike D3D, where discard deals with at lot for you behind your back). Sinbad stumbled with glMap* problems a long, long time ago but the solution was just “good enough” (http://www.stevestreeting.com/2007/03/16/glmapbuffer-how-i-mock-thee/ and http://www.stevestreeting.com/2007/03/17/glmapbuffer-vs-glbuffersubdata-the-return/)
    Having a pool of buffers works on GLES too 🙂 (**specially** on crappy drivers). The difference is that in GL4 we can use persistent mapping. And that’s huge. Really huge.

    These AZDO improvements won’t be in CTP. I haven’t pushed the new changes to my HLMS fork yet. It will be with HLMS in Final.

    When AZDO changes are finished, performance improvements will be **massive**. I’m talking Battlefield 4-scale, large cities running in real time with no need to prepare Instance managers or cheats. Just create entities and place them (though lending a hand to the engine with SCENE_STATIC in 2.0 helps a lot).

    So, yeah, I’m really excited about it.

  • dishwasher

    Great to hear that the whole AZDO concept is coming to Ogre 2.0!
    However I have a question – how does indirect rendering relate to instancing? Is (in a sense) “multi draw indirect” new, better version of “draw instanced”?

    • Matias Post author

      I used to have the same question myself.
      “multi draw indirect” has two parts in its name: “multi draw” means drawing multiple times (i.e. foreach( count ) glDraw( data[i]) ). In other words, instancing with per draw parameters.
      “indirect” means the data is supplied from a buffer (which can, btw, be generated from GPU) instead as function arguments.

      So far, we’ve just got a slightly more advanced instancing. No apparent way to render different meshes. My initial disappointment was that it doesn’t allow selecting each VBO to be fed per draw (like NVIDIA’s bindless graphics does: Send a 64-bit handle with the GPU address of the VBO to the shader, and feed vertex data from there. glBufferAddressRangeNV is used to do that)

      And here is where AZDO way-of-thinking comes into play: Put EVERYTHING in the same VBO (ok, if not everything, as much as you can, keep VBO count lower than 3) including the indices and do the memory management entirely by yourself (instead of relying on the driver).
      Multi draw indirect allows sending parameters with offsets on where to start (within the currently bound VBO) with their vertex and index count. This way as long as you have all meshes in the same VBO, you can render all of them with a single call.

      The idea is more efficient and stable than NVIDIA’s.
      More efficient because fetching the VBO handle per draw means an extra indirection (hurts the caches) but requires the engine to refactor their pipeline to put everything within the same VBO (which turns out, it’s not as hard as it sounds; but Nvidia’s solution “just works” with existing engine code, no refactor required)
      More stable because sending a dangling GPU pointer to the GPU to fetch a just-deleted vbo can mean complete system hang (depending on the HW and drivers used).