It’s often believed that in order to achieve a linear depth buffer, one needs to counter the W division by multiplying the Z component of the position before returning the vertex shader:
outPos.z = (outPos.z * outPos.w - nearPlane) / (farPlane - nearPlane)
At least that’s what I thought too for a very long time, ever since I read Robert Dunlop’s article on linearized depth. The author goes further by tweaking the projection matrix cpu side so that all you have to in the vertex shader is:
outPos.z = outPos.z * outPos.w
Which is a nice trick. The thing is, the results are not really linear.
What is linear?
Before I can continue, we first need to understand what we mean by “linear”. In the 3D jargon; when one speaks about linear depth buffers, he’s usually referring to being linear in the view space domain.
Just like Emil Persson talks in his blogpost A couple of notes about Z; the usual “z / w” projection is linear too; but it’s linear in the screen space domain. The problem with z values being linear in screen space is that they offer awful precision when getting far away from the viewer and the smaller the value of the near plane is; whereas depth values linear in view space have a much more uniform precision distribution.
So whenever we talk about “linear depth”, we’ll be referring to depth values being linear in view space.
Some math behind the trick
The idea behind the trick in question is that the GPU will later divide by W. So in theory Z / W * W = Z.
But we’re forgetting something: values are interpolated during rasterization. Normally, this operation happens in the GPU (simplified, as 3 vertices are actually being interpolated, not two):
In order to achieve a truly linear depth value, the multiplication and cancellation should be as follows:
Which is impossible to achieve from the vertex shader because we don’t have knowledge of the other 2 vertices, and we are not in control of the t variable (which in this equation stands for the interpolation weight) unless someone finds out how to cancel it in that equation from the VS.
Instead, with the proposed “trick”, this is what we end up doing:
Now it’s clear there’s no real cancellation unless W1 = W2 or t = 0 or t =1; the value is not truly linear in view space. However, we can tell it is different, which can explain that as t approaches 0 or 1, that is as we get close to the triangle’s edges, precision starts to improve.
The only way to write true linearized depth is to use the depth out semantic in the pixel shader, which turns off Z compression, Early Z, Hi-Z and other optimization algorithms employed by the GPU. And by the way, if you’re going to have all those disadvantages, you’re better off with logarithmic z-buffer whichs has insanely high precision.
To further verify this mathematical analysis, I made a special shader that shows ddx in R channel and ddy in G channel (scaled to a visible range):
The regular version looked just as expected: A solid colour across all triangles aligned to the same plane. This means the derivative is constant and all rasterization assumptions are met (for compression, Hi Z, etc)
The true linear version also looks as expected: A gradient (btw, I wonder if the 2nd order derivative is constant, which if true, theoretically it would mean linear depth buffers could be compressed and used in Hi Z with just a bit of additional overhead; I did not research into that though; besides the linear depth method would have to be provided by the HW)
The “Pseudo linear” version (aka the subject of this post) did NOT look as I expected: I was expecting to find the same look as the regular Z/W version just different colours (the constant is different). But instead, visible triangulation appeared. At first, I thought it was due to the model being exported using flat normals, which leads to duplicated vertices. But after trying again with the same model using smooth normals, the triangulation persisted.
My statement still states correct though: The stored depth is still indeed non-linear and compression friendly. However it is not as friendly as I hoped: The triangle pattern breaks a lot of blocks that would be compressed otherwise.
What causes this triangle pattern? I’m not sure, but my guess is that since the pseudo linear trick is just the equivalent of a Z-scale skewing/shearing that depends on the W value at render; the placement of the 3rd vertex (the one not shared by two adjacent triangles) will skew the triangle by different factors despite (and throwing it out of its original plane).
Best or worst of both worlds?
So, we have Z/W in one end which is very optimization friendly but highly inaccurate, and true linear Z on the other (and Log(Z) in the very far extreme) which can be very accurate but doesn’t optimize at all.
And in the middle we have the pseudo-linear version which offers midground accuracy and may take advantage of all optimizations in some scenarios, or none of them in others (depends on triangle density, how many triangles lay on the same plane, and covered area in pixels).
Clearly if common scenarios prevent most depth optimizations, devs will just prefer to use the pixel shader’s depth out semantic to get more accuracy.
But if common scenarios often still take advantage of depth optimizations, then the pseudo linear trick improves accuracy almost for free.
What is the common scenario? I’m afraid profiling and analyzing real world data seems the only answer.