So, the HLMS is not finished, but it will be the main method for using materials in Ogre 2.0 Final; and writing about how it works helps me stay on track, while documenting for others.
HLMS stands for “High Level Material System”, because for the user, the HLMS means just define the material and start looking at it (no need for coding or shader knowledge!). But on retrospective, tweaking the shader code for an HLMS is much low level than the old Materials have ever been (and that makes them very powerful).
The fantastic trio
The Hlms boils down to 3 parts:
- Scripts. To set the material properties (i.e. type of Hlms to use: PBS, Toon shading, GUI; what textures, diffuse colour, roughness, etc). You can also do this from C++ obviously. Everybody will be using this part.
- Shader template. The Hlms takes a couple hand-written glsl/hlsl files as template and then adapts it to fit the needs on the fly (i.e. if the mesh doesn’t contain skeleton, the bit of code pertaining to skeletal animation is stripped from the vertex shader). The Hlms provides a simple preprocessor to deal with this entirely within from the template, but you’re not forced to use it. Here’s a simple example of the preprocessor. I won’t be explaining the main keywords today. Advanced users will probably want to modify these files (or write some of their own) to fit their custom needs.
- C++ classes implementation. The C++ takes care of picking the shader templates and manipulating them before compiling; and most importantly it feeds the shaders with uniform/constans data and sets the textures that are being in use. It is extremely flexible, powerful, efficient and scalable, but it’s harder to use than good ol’ Materials because those used to be data-driven: there are no AutoParamsSource here. Want the view matrix? You better grab it from the camera when the scene pass is about to start, and then pass it yourself to the shader. This is very powerful, because in D3D11/GL3+ you can just set the uniform buffer with the view matrix just once for the entire frame, and thus have multiple uniforms buffers sorted by update frequency. Very advanced user will be using messing with this part.
Based on your skillset and needs, you can pick up to which parts you want to mess with. Most users will just use the scripts to define materials, advanced users will change the template, and very advanced users who need something entirely different will change all three.
For example the PBS (Physically Based Shading) type has its own C++ implementation and its own set of shader templates. The Toon Shading has its own C++ implementation and set of shaders. There is also an GUI implementation, specifically meant to deal with GUI (ignores normals & lighting, manages multiple UVs, can mix multiple texture with photoshop-like blend modes, can animate the UVs, etc)
It is theoretically possible to implement both Toon & PBS in the same C++ module, but that would be crazy, hard to maintain and not very modular. You get the idea.
Blocks, blocks, blocks
We’re introducing the concept of blocks, most of them are immutable. So far there are three:
- Datablock: In other words a “material” from the user’s perspective. It holds data (i.e. material properties) that will be passed directly to the shaders, and also holds which Macroblock & Blendblocks are assigned to it. This is the only block that is not immutable so far.
- Macroblocks: Named like that because they rarely change. Except for transparents, we sort by macroblock first. These contain information like depth check & depth write, culling mode, polygon mode (point, wireframe, solid). They’re quite analogous to D3D11_RASTERIZER_DESC. And not without reason: under the hood Macroblocks hold a ID3D11RasterizerState, and thanks to render queue’s sorting, we change them as little as possible. In other words, reduce API overhead. On GL backends, we just change the individual states on each block change. Macroblocks can be shared by many Datablocks.
- Blendblocks: Blendblocks are like Macroblocks, but they hold alpha blending operation information (blend factors: One, One_Minus_Src_Alpha; blending modes: add, substract, min, max. etc). They’re analogous to D3D11_BLEND_DESC. We also sort by blendblocks to reduce state changes.
Being immutable means you can’t change the Macro- & Blendblocks after being created. If you want to make a change, you have to create a new block and assign the new one. The previous one won’t be destroyed until asked explicitly.
Technically on OpenGL render systems (GL3+, GL ES2) you can const_cast the pointers, change the block’s parameters (mind you, the pointer is shared by other datablocks, so you will be changing them as well as side effect) and it will probably work. But it will fail on D3D11 render system.
Why Macroblocks & Blendblocks?
You could be thinking the reason I came up with these two is to fit with D3D11’s grand scheme of things while being compatible with OpenGL. But that’s a half truth and an awesome side effect. I’ve been developing the Hlms using OpenGL this whole time.
An OpenGL fanboy will tell you that grouping these together in single call like D3D11 did barely reduces API overhead in practice (as long as you keep sorting by state), and they’re right about that.
However, I still think Microsoft really nailed on this for two reasons:
- Many materials in practice share the same Macro- & Blendblock parameters. In an age where we want many 3D primitives with the same shader but slightly different parameters like texture, colour, or roughness (which equals, a different material) having these settings repeated per material wastes a lot of memory space… and a lot of bandwidth (and wastes cache space). Ogre 2.0 is bandwidth bound, so having all materials share the same pointer to the same Macroblock can potentially save a lot of bandwidth, and be friendlier to the cache at the same time.This stays true whether we use D3D11, D3D12, OpenGL, GL ES 2, or Mantle.
- Sorting by Macroblock is a lot easier (and faster) than sorting by its individual parameters: when preparing the hash used for sorting, it’s much easier to just do (every frame, per object) hash |= (macroblock->getId() << bits) & mask than to do: hash =| m->depth_check | m->depthWrite << 1 | m->depthBias << 2 | m->depth_slope_bias << 3 | m->cullMode << 18 | ….;We also need a lot more bits we can’t afford. Ogre 2.0 imposes a limit on the amount of live Macroblocks you can have at the same time; as we run out of hashing space (by the way, D3D11 has its own limit). It operates around the idea that most setting combinations won’t be used in practice.
Of course it’s not perfect, it can’t fit every use case. We inherit the same problems D3D11 has. If a particular rendering technique relies on regularly changing a property that lives in a Macroblock (i.e. like alternating depth comparison function between less & greater with every draw call, or gradually incrementing the depth bias on each draw call); you’ll end up redundantly changing a lot of other states (culling mode, polygon mode, depth check & write flags, depth bias) alongside it. This is rare. We’re aiming the general use case.
These problems make me wonder if D3D11 made the right choice of using blocks from an API perspective, since I’m not used to driver development. However from an engine perspective, blocks make sense.
No more initializeCompositor
I’ve been frowned upon that having to manually call Root::initializeCompositor is confusing and people don’t know when or where it should be called. Or why the engine can’t do it by itself.
You’ll be happy to hear this has been removed. The engine is now taking care of it.
Materials are still alive
Let me get this straight: You should be using the HLMS. The usual “Materials” are slow. Very slow. They’re inefficient and not suitable for rendering most of your models.
However, materials are still useful for:
- Quick iteration. You need to write a shader, just define the material and start coding. Why would you deal with the template’s syntax or a C++ module when you can just write a script and start coding?. The HLMS though comes with a Command line tool to know how your template translates into a final shader (which is very handy for iteration, it’s fast, and will check for syntax errors!), but it’s most useful when you want to write your own C++ module or change the template, not when you want to just experiment. Besides, old timers are used to writing materials.
- Postprocessing effects. Materials are much better suited for this. Materials are data driven, easy to write. Postprocessing FXs don’t need an awful lot of permutations (i.e. having to deal with shadow mapping, instancing, skeleton animation, facial animation). And they’re at no performance disadvantage compared to HLMS: Each FX is a fullscreen pass that needs different shaders, different textures, its own uniforms. Basically, API overhead we can’t optimize. But it doesn’t matter much either, because it’s not like there are 100 fullscreen passes. Usually there’s less than 10.
Under the hood there is an HLMS C++ implementation (HLMS_LOW_LEVEL) that acts just as a proxy to the material. I know what’s in your mind now: Yes, the HLMS is an integral part of Ogre 2.0, not just a fancy add-in.
Materials have been refactored, and thus your old code may need a few changes. Most notably Macroblocks & Blendblocks have been added to Materials, thus functions like Pass::setDepthCheck & Co have been replaced by a two calls: Pass::setMacroblock & Pass::setBlendblock.
Fixed Function has been removed, and with that multitexturing and pass splitting functionality. The HLMS default systems handle these.
Hlms Texture Manager and Texture Packs
HLMS grabs their texture data from a new texture manager (which uses the usual TextureManager behind it).
This HlmsTextureManager has two purposes:
- Provide a dummy texture when a texture hasn’t been found (meeeh…)
- Managing Texture Packs and create UV atlas on the fly automatically (Yess!!!!)
What is a Texture Pack? Suppose you have a collection of textures (of the same resolution and similar pixel format) that you know will be used together. The HLMS Texture Manager gets a list of these textures from you and tries to pack everything together.
On D3D11/GL+ it will create texture arrays. On GL ES 2, it will create a texture atlas. If no list is provided, the HLMS will start automatically packing based on default settings (i.e. maximum amount of textures per pack, default pixel format, etc) by order of request. This has the pitfall that textures that are rarely going to be used together may end up in the same pack.
Textures are divided in categories: Diffuse, normal mapping, specular mapping, detail maps, detail maps’ normal maps. Each category can have different defaults. For example normal mapping defaults to BC5 compression when available. Detail maps default to no packing when UV atlas is the only choice (i.e. GL ES2) because detail maps are usually meant to be tileable
Some formats can only be packed offline due to limitations in the format (i.e. PVRTC & ETC1). This is not yet implemented.
A work in progress
All of this is an exciting work in progress. The repo fork isn’t compiling in all platforms, some of the functionality mentioned here may be crashing.
GL3PlusRenderSystem::_render is incredibly slow, it still redundantly binds the vertex buffer on every call, which makes it slow. I don’t have performance numbers to give.
The HLMS is supposed to give us automatic instancing and even multi-draw indirect, but this isn’t yet implemented as we’re focusing on starting with a more compatible approach (start by getting GLES 2 & OpenGL 3+ working).