A glimpse of what’s coming to Ogre 2.0 Final

So, the HLMS is not finished, but it will be the main method for using materials in Ogre 2.0 Final; and writing about how it works helps me stay on track, while documenting for others.

HLMS stands for “High Level Material System”, because for the user, the HLMS means just define the material and start looking at it (no need for coding or shader knowledge!). But on retrospective, tweaking the shader code for an HLMS is much low level than the old Materials have ever been (and that makes them very powerful).

The fantastic trio

The Hlms boils down to 3 parts:

Scripts. To set the material properties (i.e. type of Hlms to use: PBS, Toon shading, GUI; what textures, diffuse colour, roughness, etc). You can also do this from C++ obviously. Everybody will be using this part.

Shader template. The Hlms takes a couple hand-written glsl/hlsl files as template and then adapts it to fit the needs on the fly (i.e. if the mesh doesn’t contain skeleton, the bit of code pertaining to skeletal animation is stripped from the vertex shader). The Hlms provides a simple preprocessor to deal with this entirely within from the template, but you’re not forced to use it. Here’s a simple example of the preprocessor. I won’t be explaining the main keywords today. Advanced users will probably want to modify these files (or write some of their own) to fit their custom needs.

C++ classes implementation. The C++ takes care of picking the shader templates and manipulating them before compiling; and most importantly it feeds the shaders with uniform/constans data and sets the textures that are being in use. It is extremely flexible, powerful, efficient and scalable, but it’s harder to use than good ol’ Materials because those used to be data-driven: there are no AutoParamsSource here. Want the view matrix? You better grab it from the camera when the scene pass is about to start, and then pass it yourself to the shader. This is very powerful, because in D3D11/GL3+ you can just set the uniform buffer with the view matrix just once for the entire frame, and thus have multiple uniforms buffers sorted by update frequency. Very advanced user will be using messing with this part.

Based on your skillset and needs, you can pick up to which parts you want to mess with. Most users will just use the scripts to define materials, advanced users will change the template, and very advanced users who need something entirely different will change all three.

For example the PBS (Physically Based Shading) type has its own C++ implementation and its own set of shader templates. The Toon Shading has its own C++ implementation and set of shaders. There is also an GUI implementation, specifically meant to deal with GUI (ignores normals & lighting, manages multiple UVs, can mix multiple texture with photoshop-like blend modes, can animate the UVs, etc)

It is theoretically possible to implement both Toon & PBS in the same C++ module, but that would be crazy, hard to maintain and not very modular. You get the idea.

Blocks, blocks, blocks

We’re introducing the concept of blocks, most of them are immutable. So far there are three:

Datablock: In other words a “material” from the user’s perspective. It holds data (i.e. material properties) that will be passed directly to the shaders, and also holds which Macroblock & Blendblocks are assigned to it. This is the only block that is not immutable so far.

Macroblocks: Named like that because they rarely change. Except for transparents, we sort by macroblock first. These contain information like depth check & depth write, culling mode, polygon mode (point, wireframe, solid). They’re quite analogous to D3D11_RASTERIZER_DESC. And not without reason: under the hood Macroblocks hold a ID3D11RasterizerState, and thanks to render queue’s sorting, we change them as little as possible. In other words, reduce API overhead. On GL backends, we just change the individual states on each block change. Macroblocks can be shared by many Datablocks.

Blendblocks: Blendblocks are like Macroblocks, but they hold alpha blending operation information (blend factors: One, One_Minus_Src_Alpha; blending modes: add, substract, min, max. etc). They’re analogous to D3D11_BLEND_DESC. We also sort by blendblocks to reduce state changes.

Being immutable means you can’t change the Macro- & Blendblocks after being created. If you want to make a change, you have to create a new block and assign the new one. The previous one won’t be destroyed until asked explicitly.

Technically on OpenGL render systems (GL3+, GL ES2) you can const_cast the pointers, change the block’s parameters (mind you, the pointer is shared by other datablocks, so you will be changing them as well as side effect) and it will probably work. But it will fail on D3D11 render system.

Why Macroblocks & Blendblocks?

You could be thinking the reason I came up with these two is to fit with D3D11’s grand scheme of things while being compatible with OpenGL. But that’s a half truth and an awesome side effect. I’ve been developing the Hlms using OpenGL this whole time.

An OpenGL fanboy will tell you that grouping these together in single call like D3D11 did barely reduces API overhead in practice (as long as you keep sorting by state), and they’re right about that.

However, I still think Microsoft really nailed on this for two reasons:

Many materials in practice share the same Macro- & Blendblock parameters. In an age where we want many 3D primitives with the same shader but slightly different parameters like texture, colour, or roughness (which equals, a different material) having these settings repeated per material wastes a lot of memory space… and a lot of bandwidth (and wastes cache space). Ogre 2.0 is bandwidth bound, so having all materials share the same pointer to the same Macroblock can potentially save a lot of bandwidth, and be friendlier to the cache at the same time.This stays true whether we use D3D11, D3D12, OpenGL, GL ES 2, or Mantle.

Sorting by Macroblock is a lot easier (and faster) than sorting by its individual parameters: when preparing the hash used for sorting, it’s much easier to just do (every frame, per object) hash |= (macroblock->getId() << bits) & mask than to do: hash =| m->depth_check | m->depthWrite << 1 | m->depthBias << 2 | m->depth_slope_bias << 3 | m->cullMode << 18 | ….;We also need a lot more bits we can’t afford. Ogre 2.0 imposes a limit on the amount of live Macroblocks you can have at the same time; as we run out of hashing space (by the way, D3D11 has its own limit). It operates around the idea that most setting combinations won’t be used in practice.

Of course it’s not perfect, it can’t fit every use case. We inherit the same problems D3D11 has. If a particular rendering technique relies on regularly changing a property that lives in a Macroblock (i.e. like alternating depth comparison function between less & greater with every draw call, or gradually incrementing the depth bias on each draw call); you’ll end up redundantly changing a lot of other states (culling mode, polygon mode, depth check & write flags, depth bias) alongside it. This is rare. We’re aiming the general use case.

These problems make me wonder if D3D11 made the right choice of using blocks from an API perspective, since I’m not used to driver development. However from an engine perspective, blocks make sense.

No more initializeCompositor

I’ve been frowned upon that having to manually call Root::initializeCompositor is confusing and people don’t know when or where it should be called. Or why the engine can’t do it by itself.

You’ll be happy to hear this has been removed. The engine is now taking care of it.

Materials are still alive

Let me get this straight: You should be using the HLMS. The usual “Materials” are slow. Very slow. They’re inefficient and not suitable for rendering most of your models.

However, materials are still useful for:

Quick iteration. You need to write a shader, just define the material and start coding. Why would you deal with the template’s syntax or a C++ module when you can just write a script and start coding?. The HLMS though comes with a Command line tool to know how your template translates into a final shader (which is very handy for iteration, it’s fast, and will check for syntax errors!), but it’s most useful when you want to write your own C++ module or change the template, not when you want to just experiment. Besides, old timers are used to writing materials.

Postprocessing effects. Materials are much better suited for this. Materials are data driven, easy to write. Postprocessing FXs don’t need an awful lot of permutations (i.e. having to deal with shadow mapping, instancing, skeleton animation, facial animation). And they’re at no performance disadvantage compared to HLMS: Each FX is a fullscreen pass that needs different shaders, different textures, its own uniforms. Basically, API overhead we can’t optimize. But it doesn’t matter much either, because it’s not like there are 100 fullscreen passes. Usually there’s less than 10.

Under the hood there is an HLMS C++ implementation (HLMS_LOW_LEVEL) that acts just as a proxy to the material. I know what’s in your mind now: Yes, the HLMS is an integral part of Ogre 2.0, not just a fancy add-in.

Materials have been refactored, and thus your old code may need a few changes. Most notably Macroblocks & Blendblocks have been added to Materials, thus functions like Pass::setDepthCheck & Co have been replaced by a two calls: Pass::setMacroblock & Pass::setBlendblock.

Fixed Function has been removed, and with that multitexturing and pass splitting functionality. The HLMS default systems handle these.

Hlms Texture Manager and Texture Packs

HLMS grabs their texture data from a new texture manager (which uses the usual TextureManager behind it).

This HlmsTextureManager has two purposes:

Provide a dummy texture when a texture hasn’t been found (meeeh…)
Managing Texture Packs and create UV atlas on the fly automatically (Yess!!!!)

What is a Texture Pack? Suppose you have a collection of textures (of the same resolution and similar pixel format) that you know will be used together. The HLMS Texture Manager gets a list of these textures from you and tries to pack everything together.

On D3D11/GL+ it will create texture arrays. On GL ES 2, it will create a texture atlas. If no list is provided, the HLMS will start automatically packing based on default settings (i.e. maximum amount of textures per pack, default pixel format, etc) by order of request. This has the pitfall that textures that are rarely going to be used together may end up in the same pack.

Textures are divided in categories: Diffuse, normal mapping, specular mapping, detail maps, detail maps’ normal maps. Each category can have different defaults. For example normal mapping defaults to BC5 compression when available. Detail maps default to no packing when UV atlas is the only choice (i.e. GL ES2) because detail maps are usually meant to be tileable

Some formats can only be packed offline due to limitations in the format (i.e. PVRTC & ETC1). This is not yet implemented.

A work in progress

All of this is an exciting work in progress. The repo fork isn’t compiling in all platforms, some of the functionality mentioned here may be crashing.

GL3PlusRenderSystem::_render is incredibly slow, it still redundantly binds the vertex buffer on every call, which makes it slow. I don’t have performance numbers to give.

The HLMS is supposed to give us automatic instancing and even multi-draw indirect, but this isn’t yet implemented as we’re focusing on starting with a more compatible approach (start by getting GLES 2 & OpenGL 3+ working).

Matias Post author May 22, 2014 at 21:42

Hi!

To address your questions:

3) Let’s split this question in two parts:
a. Toon & PBS shading are extremely different. The equation formulas, the parameters needed. Maintaining both in the same templates would be a complete mess. Remember here that feature combinations have 2^N complexity. It’s hard to keep track of all of them.

Also the beauty of keeping them separate is that we can perform incredible optimizations. Passing parameters from the Datablock to the pixel shader’s uniform block is literally a couple memcpys. If we were to mix toon & pbs, we would have to start adding if( toon ) else … which is an extra branch CPU side, and more cache misses per entity. With this, approach, the only possible cache miss will be when selecting the HLMS generator type per entity, which we already have anyway (and should be in the cache, and since the render queue is sorted, should be very predictable).
We also lose space (i.e. roughness parameter from pbs is not used in Toon)

b. The Hlms does NOT aim to emulate FFP. The RTSS component went that road. No thanks. We want to look next gen by emulating the 90’s techniques? Furthermore the FFP is a monster of a state machine.
The HLMS approaches from another angle: it aims to solve the user’s problems (define cool material properties, making it look good with little effort, while not having to deal with shadow mapping, hw skeleton animation, instancing, telling the program whether vertex texture fetch will be used, etc)

For example, the PBS generator won’t be emulating fog. Fog will be provided as a postprocessing effect by reading the depth buffer. It is much more powerful (not just distance-based fog), looks much better, and is faster (each pixel is only evaluated once).
Also something I didn’t mention is the ability to embed user-defined snippets into the vertex & pixel shaders, which is a much more customized approach than offering FFP settings.

The FFP offers “cubemapping” uvw for faking metalized materials, but that’s quite a joke in PBS.
In PBS we get real cubemapping (with more advanced formulas to get correct projection) that interacts with the BRDF and the material parameters, and the “metalized” look can be easily achieved.

Of course, no one prevents someone writting an Hlms generator that indeed, emulates the FPP.

1) I expect by the end of this month to be quite usable. In its current state it has bugs, but just yesterday it started to compile & work in Android too with GLES2 (yay!).
On Windows, it’s compiling again (save for D3D11 rendersystem; D3D9 isn’t compiling either but will be removed).
Note that the current generators are the mobile ones (which work on dekstop GL & mobile). I have big plans for the Desktop version.
Also note that _render is not yet optimized and I have to write a mesh hash for sorting by mesh (currently, using an uninitialized value…)

2) Reporting bugs is cool, also telling me your experience when trying to integrate it. I actually would need to write a manual on how it works, how to extended it.
Until it’s a bit more mature, I can’t really ask for much help.

6 thoughts on “A glimpse of what’s coming to Ogre 2.0 Final”

al2950 May 22, 2014 at 11:59

All looks very promising, a couple of questions/queries;
1) When do you think it will be in a state for developers to start experimenting with?
2) Do you want any help finishing it off!?
3) I dont entirely agree with your point about separating toon & PBS, from my point of view the main/default shader generator (currently called PBS i guess) should be able to emulate everything in FFP. Then there should be a material property called “lighting_method”, which may need to be implemented as a different pre-processor keyword eg “@option”, anyway this should take options like the following; “vertex blinn phong”, “per pixel blinn phong”, “PBS”, “Toon”. Does this make any sense!? I dont see maintaining it being that difficult especially when using the @piece & @insertpiece keywords. What are the arguments for not doing this, or have a completely misunderstood something!?

Keep up the awesome work!
- Matias Post author May 22, 2014 at 21:42
  
  Hi!
  
  To address your questions:
  
  3) Let’s split this question in two parts:
  a. Toon & PBS shading are extremely different. The equation formulas, the parameters needed. Maintaining both in the same templates would be a complete mess. Remember here that feature combinations have 2^N complexity. It’s hard to keep track of all of them.
  
  Also the beauty of keeping them separate is that we can perform incredible optimizations. Passing parameters from the Datablock to the pixel shader’s uniform block is literally a couple memcpys. If we were to mix toon & pbs, we would have to start adding if( toon ) else … which is an extra branch CPU side, and more cache misses per entity. With this, approach, the only possible cache miss will be when selecting the HLMS generator type per entity, which we already have anyway (and should be in the cache, and since the render queue is sorted, should be very predictable).
  We also lose space (i.e. roughness parameter from pbs is not used in Toon)
  
  b. The Hlms does NOT aim to emulate FFP. The RTSS component went that road. No thanks. We want to look next gen by emulating the 90’s techniques? Furthermore the FFP is a monster of a state machine.
  The HLMS approaches from another angle: it aims to solve the user’s problems (define cool material properties, making it look good with little effort, while not having to deal with shadow mapping, hw skeleton animation, instancing, telling the program whether vertex texture fetch will be used, etc)
  
  For example, the PBS generator won’t be emulating fog. Fog will be provided as a postprocessing effect by reading the depth buffer. It is much more powerful (not just distance-based fog), looks much better, and is faster (each pixel is only evaluated once).
  Also something I didn’t mention is the ability to embed user-defined snippets into the vertex & pixel shaders, which is a much more customized approach than offering FFP settings.
  
  The FFP offers “cubemapping” uvw for faking metalized materials, but that’s quite a joke in PBS.
  In PBS we get real cubemapping (with more advanced formulas to get correct projection) that interacts with the BRDF and the material parameters, and the “metalized” look can be easily achieved.
  
  Of course, no one prevents someone writting an Hlms generator that indeed, emulates the FPP.
  
  1) I expect by the end of this month to be quite usable. In its current state it has bugs, but just yesterday it started to compile & work in Android too with GLES2 (yay!).
  On Windows, it’s compiling again (save for D3D11 rendersystem; D3D9 isn’t compiling either but will be removed).
  Note that the current generators are the mobile ones (which work on dekstop GL & mobile). I have big plans for the Desktop version.
  Also note that _render is not yet optimized and I have to write a mesh hash for sorting by mesh (currently, using an uninitialized value…)
  
  2) Reporting bugs is cool, also telling me your experience when trying to integrate it. I actually would need to write a manual on how it works, how to extended it.
  Until it’s a bit more mature, I can’t really ask for much help.
  - al2950 May 23, 2014 at 06:45
    
    Ah ok, that all makes perfect sense. Thank you for explaining.
Christopher Sosa May 22, 2014 at 23:44

ALL ABOARD THE HYPE TRAIN TO OGRE 2.0

Keep the good work!, i can’t wait for Ogre 2.0 Final!.
John July 2, 2014 at 17:39

It’s great to hear the work is progressing and the features are very useful and much appreciated. Unfortunately the fact that you are working in isolation is a problem for me.

I am an experienced Ogre developer, and stakeholder with several commercial products leveraging Ogre, but my attempts to establish communication with the core members have failed. My bug reports, patch submissions go unused, and recommendations fall on deaf ears. I still find many critical bugs and much needed feature additions. Lately I have been making local modifications and I find it difficult to justify the time to submit patches.

I understand Ogre is going through a lot of changes, but the community should be involved. I feel I am part of the community but not recognized. Ogre is not evolving the way I need which is forcing me to investigate alternatives.
- Matias Post author July 2, 2014 at 18:28
  
  Hi John.
  
  I’m sad to hear that. Do you have links to the submitted bug reports, patch submissions, and recommendations so I can take a look at them?
  
  Thanks.

Comments are closed.

Yosoygames

Think out of the box. A site about software, video games, graphics, music and media in general