Vulkan: Why some GPUs expose a dozen of identical memory types?


If you browse the capabilities of the GeForce 1080, you’ll notice it exposes 8 memory types on heap 1

If you browse an even older GeForce GTX 680, you’ll notice it exposes 2 memory types on heap 0 and 8 memory types on heap 1, often with propertyFlags = 0 which is valid and means “none of the above” (i.e. it’s not device local but it’s also not visible by CPU)

If you browse Adreno 430 report you’ll notice memory type pairs: 0 + 3 and 1 + 4 are identical

So… what gives?

To understand why are there identical memory types we need to understand two things:

1. Order Matters

Vulkan spec says if two propertyFlags are equal, drivers must expose memory in order of performance.

That is, memProperties.memoryTypes[0] is faster than memProperties.memoryTypes[1] (as long as they have the same propertyFlags)

2. Resource compatibility with memory type

You must check the resource you’re creating is compatible with the heap and type. It’s very easy to miss this step! With AMD gpus (and most Intel, and most recent NV) you just pick a memory type with the flags you’re looking for and allocate resources on it. After all memory is just space to be consumed, right?.

But that’s not the case for all GPUs. To be strictly compliant, you must create a (dummy or final) buffer with the usage flags you want to use and check the memory requirements:

uint32 VulkanVaoManager::determineSupportedMemoryTypes( VkBufferUsageFlags usageFlags ) const
{
    VkBuffer tmpBuffer;
    VkBufferCreateInfo bufferCi;
    makeVkStruct( bufferCi, VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO );
    bufferCi.size = 64;
    bufferCi.usage = VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT |
                     VK_BUFFER_USAGE_UNIFORM_TEXEL_BUFFER_BIT | VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT |
                     VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_INDEX_BUFFER_BIT |
                     VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT;
    VkResult result = vkCreateBuffer( mDevice->mDevice, &bufferCi, 0, &tmpBuffer );
    checkVkResult( result, "vkCreateBuffer" );

    VkMemoryRequirements memRequirements;
    vkGetBufferMemoryRequirements( mDevice->mDevice, tmpBuffer, &memRequirements );
    vkDestroyBuffer( mDevice->mDevice, tmpBuffer, 0 );

    return memRequirements.memoryTypeBits;
}

In this case we’re lazy and we create a 64-byte VkBuffer with all possible VK_BUFFER_USAGE_* we’re gonna need, call vkGetBufferMemoryRequirements, and see which memory type matches what we need.

Though notice that:

  1. The supported buffer could be none (i.e. no memory type supports all those flags at the same time). That has never happened to us with those flags, so we’re safe.
  2. If you want the optimum performance then you’d create a VkBuffer that is as restrictive as possible i.e. only use VERTEX_BUFFER bits if it’s only going to be used as a vertex buffer; but that kinda beats the point of bindless-everything, doesn’t it?

Note the same happens with textures. You must call vkGetImageMemoryRequirements to see which memory types can hold those textures. And now that you know which ones can do it, you should select the one with the lowest index and the flags you want the most.

So that’s it. That’s the reason you’ll find “duplicate” memory types. It’s because you have to check memory types for each resource before deciding where to place it. This is quite counter-intuitive since it’s easier to design your engine as “first I select the memory types, then I create the resources” instead of “first I create the resources, then I select the memory types”

In the case of Ogre3D we made a compromise and decide the buffer memory first (creating a temporary dummy 64-byte buffer; and crossing fingers the driver won’t reject a bigger buffer with the same flags); while for textures we create a list of candidate pool types at init time, and then when we create the texture we select among those candidates.

Btw regarding textures, please beware that when bufferImageGranularity != 1u; you must must reserve bufferImageGranularity of alignment between linear resources (e.g. buffers, linear textures) and regular tiled/optimal textures; otherwise these resources are considered to alias.

The easiest solution is that when bufferImageGranularity != 1u; you put all textures together in its own memory pool; rather then sharing space within the same vkAllocateMemory with buffers. Mixing linear and tiled resources bufferImageGranularity != 1u is extremely hard. so it’s best to avoid it entirely.