The Sorted Vector pattern Part I

This is Part I, you can find Part II is here

After talking with a few fellow Ogre team members, other colleagues, and even with community, I’ve realized that many are not familiar with the sorted vector pattern as a replacement for an std::map.

Normally, C++’s std map (a Dictionary in Python terms) would be used to store data organized by “keys”; and maps are supposed to have fast amortized lookups for large numbers, normally O( log N ) in the following way:

std::map<int, std::string> fooMap;
std::map<std::string, std::string> barMap;
fooMap[7] = "This is number 7";
barMap["name"] = "John";
auto fooIt = fooMap.find( 7 );
auto barIt = barMap.find( "name" );
if( fooIt != fooMap.end() )
   printf( fooIt->second ); //Prints "This is number 7"
if( barIt != barMap.end() )
   printf( barIt->second ); //Prints "John"

(note: I personally discourage the use of auto and prefer typedefs, but for the sake of the example, this is cleaner)

Now, let’s take a look at map’s weaknesses and strengths:

std::map

O( log N ) lookups
O( log N ) insertion
O( log N ) removal
Iteration trashes the cache badly (memory is not contiguous)
If the item to store is small, the overhead is huge.

The problem is, game engines rarely care about insertions and removals since typically everything is loaded/unloaded in bulk at certain points (i.e. loading screen, in a background thread while walking a corridor, on exit, etc); but they care about lookups and iteration (every frame!)

If each element to store is ~8 bytes (i.e. 4 bytes for key, 4 bytes for element); then each element will have a 150% overhead! 4 bytes for the parent pointer, 2×4 bytes one for each leaf. If this a 64-bit build, the overhead ratio per element grows to a whooping 300%!

Game engines, on the other hand have the following needs:

Sorted Vector

O( log N ) lookups
Fast iteration
Contiguous memory
Low overhead

Of course, we could roll our own STL container; but luckily we can already achieve this with a simple vector and a few stl algorithm calls. All we’re gonna need is:

std::vector
std::sort
std::lower_bound
A comparision function in advanced cases

The idea is simple: lower_bound performs a binary search, which means O( log N ). But lower_bound requires the vector’s elements to be already sorted. std::sort provides that.

Since it is a vector, iteration is not a problem. Unlike map, there’s very little being done behind your back. Removing an element while maintaining order of elements is O(N).

As for insertions, you may have to choose whether you push all elements and then call std::sort, or insert every element in order.

Implementing a sorted vector (simple)

The following snippet explains how to implement a sorted vector with O( log N ) lookups:

vector<int> myInts;
myInts.push_back( 0 );
myInts.push_back( 7 );
myInts.push_back( 3 );
myInts.push_back( 8 );
std::sort( myInts.begin(), myInts.end() ); //myInt = { 0, 3, 7, 8 }
vector<int>::const_iterator it = std::lower_bound( myInts.begin(), myInts.end(), 7 );
if( it != myInts.end() && it == 7 )
{
    printf( "%i", *it ); //Prints 7
}
it = std::lower_bound( myInts.begin(), myInts.end(), 5 );
if( it != myInts.end() && it == 5 )
{
    printf( "%i", *it ); //Will never enter here (5 is not in the array).
}

Note something very important: We don’t just check that “it != myInts.end()”; we also check that the returned iterator is the one we’re looking for. Normally, map.find returns map.end if the element wasn’t found. But lower_bound will return us the first element matching the criteria ‘x < itor’.

This may look troublesome, but it has a very convenient property: It tells us where to insert a new element!

Following the previous example, we now will try to enter the element ‘5’ while maintaining order:

vector<int> myInts; //myInt = { 0, 3, 7, 8 }
it = std::lower_bound( myInts.begin(), myInts.end(), 5 );
if( it != myInts.end() && it != 5 )
{
    myInts.insert( it, 5 ); //myInt = { 0, 3, 5, 7, 8 }
}

Remember that this is a vector, so insertions can take anywhere from O(1) to O(N); plus the lookup O( log N ) we just performed with lower_bound.

However, it’s still convenient if you need to add a few more elements instead of calling sort again.

Don’t use it for everything

It’s often tempting to evangelize and start using this pattern everywhere instead of a map. Be very responsible in analyzing your needs: Vectors have the overheads of resizing, having to call sort(), O(N) insertions and removals, and asserting in debug builds that the input vector is indeed sorted. Also a map is easier to use i.e. myMap[“key”] = 1; is clearer than std::lower_bound( myVec.begin(), myVec.end(), “key” ) then checking that the key was found.

On the other side, if you know beforehand the amount of elements that will be inserted, a sorted vector allows you to keep the memory budget tightly controlled and minimal overhead.

Summarizing

The sorted vector pattern is very useful for fast lookups and fast iteration when compared to maps (and insertions/removals are rare or don’t matter), which is often what’s needed in a game engine; and you will see this often in Ogre 2.0 engine.
In Part II we’re going to see more complex examples.

9 thoughts on “The Sorted Vector pattern Part I”

zack_ March 17, 2015 at 15:44

What about using an std::unordered_map ? Has constant look up.

Matias Post author March 17, 2015 at 17:33

unordered_map is an entire different beast. It relies on hashing (and thus risk for collisions; and also spend CPU cycles on calculating a hash we don’t need); and doesn’t have the lookup properties that the sorted vector nor map have. Due to its internals, unordered_map uses more memory for bookkeeping, which again, we don’t need.
Sorted Vector is as good as a Map for lookups but much better for iteration; but we trade off creation and removals, specially for large datasets (which as I state in the article, it isn’t common to create or remove frequently in games)

zack_ March 17, 2015 at 19:08

I don’t see how your sorted vector compare to a std::map, its more a comparison of std:.set and a sorted vector. You miss completely the “values” of the map.

And this article is also missing something really important, numbers or a diagram. A benchmark to prove it.

So i guess, it also depends on the number of elements you have in the container.

Matias Post author March 17, 2015 at 19:37

Part II (https://www.yosoygames.com.ar/wp/2014/11/the-sorted-vector-pattern-part-ii/) explores how to use this method as a map.
Indeed the way it is presented in Part I is analogous to a set.

As for the numbers: indeed the number of elements is important as the biggest tradeoff is in insertion and removals; not really in iteration or lookup. The lookup algorithm used in the vector is the same as the one used in the map or set. The difference lies in the data layout. A contiguous array is just much more cache friendly.

We’re talking about 0-50 of elements. Ocassionally 1000-10.000; but usually in the context of game development such container would be generated once during load time and not at run time, per frame (hence eliminating the higher creation/removal cost).

I would certainly not recommend a sorted vector if your element count is in the order of 10.000.000-100.000.000 elements.

I won’t be providing the “proof” because it is extremely easy to compare a sorted vector against a standard map. If you want numbers, proof, etc I suggest start with:
http://www.pvk.ca/Blog/2012/07/03/binary-search-star-eliminates-star-branch-mispredictions/
http://www.pvk.ca/Blog/2012/07/30/binary-search-is-a-pathological-case-for-caches/

zack_ March 18, 2015 at 15:10

I did a quick benchmark and in my code the std::unordered_map beats the sorted vector by magnitude
for find operation.

http://rextester.com/KSE3950

But maybe there is also something wrong in my code, I will have to take a look closer, to confirm the result.

Matias Post author March 18, 2015 at 17:35

Some things that may seem extremely obvious to me, are apparently not obvious to other people.

As to your benchmark:

1. You generate an array of random strings with 10.000 elements and an average of 18 characters each. With keys this size; it’s no wonder a hash of usually 4 or 8 bytes (between 40.000 and 80.000 bytes total) beats a raw data comparison of 180.000 bytes. The amount of L1 data cache misses skyrockets. Specially after a 100000 iterations.
Your unordered_map is expected to have a higher build time to calculate the hashes, which you’re not measuring (though it is perfectly valid if, for example, the unordered_map is built once at loading time).
As I said in part II; Strings should be avoided for keys, and use hashed strings instead. Furthermore std::string in particular sucks at performance.
If you really need to use strings, knowing your data is far more important than the function used.

2. Your OrderString function hashes every time it is called. You *shouldn’t* be doing that. Defining it as simply as “return (_left) < (_right);" halves the time to complete. 3. You're calling chrono::high_resolution_clock::now on every iteration. This adds a lot of measuring noise. 4. std::uniform_int_distribution<> dist(0, sizeof(alphanumeric) – 1); should be std::uniform_int_distribution<> dist(0, sizeof(alphanumeric) – 2); to avoid the null terminator.

5. To benchmark properly, you shouldn’t be looking for the same key every time. Since one is hashed and the other is not, the number of “hops” until the destination is reached can vary greatly. To negate this effect you would have to search for many different strings.

6. When you generate the vector, you allow for duplicated entries to happen (thus converting the sorted vector pattern into replacement for multi_map) whereas the stringToMap eliminates the duplicated entires, shrinking as a result.

7. unordered map uses more memory. This is another another requirement in the specs: Low memory footprint and fast iteration (both are related). What’s worse, unordered map leaves the overhead to the stl implementation; which means it tries to escape our control. Something that in game development can wreak havoc.
- zack_ March 19, 2015 at 18:12
  
  Sorry, that I didn’t see the “extremely obvious” facts…
  But that brought me here in the first place and reading this article.
  
  I fixed some issues you mentioned in the benchmark:
  https://ideone.com/Y7jIly
  
  Now I get your result. The sorted vector pattern is not bad. Now I have it also in my tool box, thanks to you.
  
  Do you have a class for this in Ogre or do you use it like described here?
  
  PS: In your part 2 you have some compile errors in the class: OrderFooByName
  Replace the “bool” with const after the operator declaration and you forgot to declare the right to left comparison.
Redchards April 7, 2015 at 06:32

@zack_, I simply wanted to say thet to get true result from your last benchmark, you’ll need to remove the chrono::high_resolution_clok::now() you forgot in the undordered_map. Without it, at least on my machine, both perform the same (with undordered_map using a bit more memory of course)

@Matias, great article, thanks !
- Redchards April 7, 2015 at 06:34
  
  @zack_, sorry, don’t know how to edit message. By “the chrono you forgot in the unordered_map”, I meant “the chrono you forgot in the unordered_map benchmarking loop” 🙂

Comments are closed.

Yosoygames

Think out of the box. A site about software, video games, graphics, music and media in general