C++ iterable enums 3


This is a rant. C++17 is at our doorstep and we’re no closer to iterable enums. And what annoys me is that I coded a solution in a single day as proof of concept that should be implemented at compiler level, yet I refuse to believe I’m 100 times smarter than everyone else to thought this up.

But since we’re still stuck, I am now speaking up.

 

Problems:

Enums in C++ (and also C) have the following challenges:

  • Retrieve the minimum value an enum can achieve.
  • Retrieve the maximum value an enum can achieve.
  • Retrieve the number of possible values an enum can have.
  • Find out whether an arbitrary integer maps to a valid enum value.
  • Iterate through all variables in an enum.
  • Convert an enum to string.
  • Convert a string to an enum.
  • IT MUST BE SIMPLE. This is something the C++ committee seems to have long forgotten.

 

Interface:

Don’t give me template garbage bs. It’s not necessary. If my enum looks like this:

enum ExampleContiguous
{
    Five = 5,
    Six,
    Seven,
};

Then retrieving the number of elements should be as simple as this:

size_t numElements = ExampleContiguous::count(); //numElements = 3
size_t minValue = ExampleContiguous::minValue(); //maxValue = 5
size_t maxValue = ExampleContiguous::maxValue(); //maxValue = 7

 

Retrieving: O(1) vs O(N)

Enums can basically be classified in two: Contiguous and non-contiguous. Contiguous are those who are in range [minValue; maxValue] with no gaps. Non-contiguous have gaps in them. This is an important distinction because contiguous enums can perform operations in constant time, while non-contiguous requires searching (i.e. binary search). Therefore the interface should reflect this:

/// Only available if the enum is contiguous. Otherwise it's a compiler error
/// It's O(1)
/// Returns empty string if value is not a valid enum value
static const std::string& get( ExampleContiguous value );

In this example ExampleContiguous::get will return the string version of value. So for instance ExampleContiguous::get( Six ); will return “Six” in O(1) time.

If the enum were to be non-contiguous, then ExampleContiguous::get would return a compiler error. So how do we retrieve the string name of ExampleContiguous? Easy: find():

/// Always available. O(1) if enum is contiguous, O(log N) otherwise
/// Returns empty string if value is not a valid enum value
static const std::string& find( ExampleContiguous value );

So calling ExampleContiguous::find( Six ); will return “Six”, it is guaranteed to compile. But it’s complexity depends on how the enum was declared. If it’s contiguous, find and get are equivalent. If it’s not contiguous, find will perform a binary search.

 

Iterating

This is extremely easy. ExampleContiguous::begin and ExampleContiguous::end:

static const EnumData* begin() { return &mSortedByEnum[0]; }
static const EnumData* end() { return &mSortedByEnum[3]; }

const EnumData *itor = ExampleContiguous::begin();
const EnumData *end  = ExampleContiguous::end();

while( itor != end )
{
    printf( "%s\n", ExampleContiguous::get( *itor ) );
    ++itor;
}

//Executing the code outputs:
Five
Six
Seven

Modifying the const iterators (i.e. through const_cast) is undefined behavior.

Validating ints

We want to know if an int maps to an enum or not. No problem:

/// Returns true if integer value can be represented by the enum.
/// O(1) if enum is contiguous, O(log N) otherwise
static bool isValidValue( int value );

 

It’s not possible! It’s not that easy.

Yes, it fucking is. Months ago I wrote a Python script during an afternoon that would parse C++ headers in the search of enums and generate actual valid C++ usable code with this interface. Unfortunately the python script cannot generate code that can deal with privately declared enums unless I use the friend keyword in the class. But doing this at compiler level should be possible.

The Python script will enumerate all *.h & *.hpp files in a given folder and parse them, generating one .h file per enum, and one cpp file called AmalgamationEnum.cpp. It will first check if the files to be parsed have changed via timestamp, then check if the generated headers have changed via md5 to prevent regenerating the source and include files every time, thus triggering unnecessary recompilation every time you change something unrelated to the enums.

Additionally, I based the Python code on CppHeaderParse which may not be capable of parsing all C++ syntax shenanigans. I should’ve used some Clang Python bindings, but I wasn’t willing to put much effort into it. It was a proof of concept than something stable to use in production. Why shout out now? I had forgotten about it, but now the impending release of C++17 remind me we’re no closer to it. It’s not my job to fix the language, it’s somebody else’s. But apparently I have to get shit done myself.

 

Proof of concept download

I’ve uploaded the code to https://bitbucket.org/dark_sylinc/enumiterator

See EnumIterator.h and EnumIterator.cpp in practice.

Due to current C++ rules, if your enum is called “MyFoo”, then the implementation class will be named “MyFooEnum” where you can retrieve all the information (iterators, converting to string); as I cannot name a class “MyFoo” directly.

 

Forgive me for the usage of foul language in this post. I’m simply fed up. This should’ve been addressed 20 years ago (and counting).


3 thoughts on “C++ iterable enums

  • Manuel Bergler

    Unfortunately, C++ enums aren’t as simple as you present them to be.

    First of, regarding `minValue`, `maxValue` and `count`:
    According to http://eel.is/c++draft/dcl.enum#8 the range of values for an enum is the range of the underlying type in case of an enum with fixed underlying type, and otherwise there is a formula to determine the valid range. In particular, for your `ExampleContiguous` enum above the range is [0, 7]. It is valid to assign `ExampleContiugous val = 0;`! Note that the reason for this is to allow C-style bitsets:

    enum options { BORDERS = 1, SCROLLBARS = 2, COLORED = 4 };
    options selected_options = BORDERS | COLORED // == 5;

    With that in mind the natural semantics for functions named `minValue` and `maxValue` is to return the limit of the range of valid values, not the values of the minimal/maximal enumerators respectively and I’d expect `count` to return the number of possible values instead of the number of defined enumerators. Apart from that, what are use-cases for the `minValue`, `maxValue` functions with the semantics you proposed?

    I agree that for the case of an enum without fixed underlying type a `isValidValue` function would be helpful, simply because the rules are not known to everyone – and even if they are, I don’t want to calculate the valid range by hand for all of my enums. The other alternative is simply to use only enums with fixed underlying type, then this function becomes unnecessary.

    Regarding getting the names of corresponding enumerators: That doesn’t work for regular enums in the way you described, since it is possible to have multiple enumerators map to the same value (see e.g. the example provided in the standard http://eel.is/c++draft/dcl.enum#2). If you have at least two enumerators `E1` and `E2` with the same value, integral promotion will automatically cause any comparison of the argument passed to such a function to return true for both `E1` and `E2` or for neither of the two. That implies that depending on the order of the comparisons `find(E1) == find(E2) == “E1″` or `find(E1) == find(E2) == “E2″`.

    But even if we ignore this issue, why can’t you do the lookup in O(1) in all cases? Create a hash map with perfect hashing at compile time which maps the values of the enumerators to the corresponding string and you shouldn’t need to care about whether or not the enum is contiguous. Then both of these functions can be replaced by a single `to_string` function.

    I can’t really comment on iteration over enums, because I frankly never needed that (or at least I can’t remember if I ever did). But depending on the use case you might want to iterate over all values, not only the ones that have a corresponding enumerator. So even that is not as simply as it seems.

    To summarize:
    – `count`, `maxValue`, `minValue` should be renamed and `isValidValue` are unnecessary for an enum with fixed underlying type
    -> don’t use enums without fixed underlying type
    – `get`/`find` Useful for debugging/logging output, but should be rename to `to_string` and perform the lookup in constant time
    – iteration: Probably need two types of iterators, one to only iterate over enumerators, second one to iterate over all possible values
    -> which one should `begin`/`end` return?

    One last remark: The standards committee is working on a proposal basis. If no one proposes a feature then it will never be part of the standard. If you really think iteration over enums is that important, then submit a proposal! Apart from that, the committee is currently working on static reflection, which would allow to implement the features you’d like to have as a library, with the added benefit of getting reflection for classes, namespaces, etc and not only for enums.

    • Matias Post author

      I was under the impression values outside what was explicitly defined by enums were not values.
      Anyway, it does not change too much, just renaming a few functions to fix ambiguity. This is far from “not being that simple”:

      • count, maxValue, minValue should be renamed indeed
      • isValidValue should be renamed to isExplicitlyDefined
      • Regarding get and find; there’s still a lot of value. If two values map to same integer (e.g Three = 3 and Drei = 3); then according to the standard these two enums are equivalent and from a machine perspective the information of whether it was Three or Drei is effective lost. What should get / find return? Most likely the first one that was defined. If the user needs to know what comes next, the iterators are sorted so he can go to the next iteration (e.g. like a multi_map).
      • I did not think of a perfect hashing. I’m not sure if it’s possible to always arrive at a perfect hashing, which is why there needs to be a distinct semantic that ‘get’ is guaranteed to be O(1) while ‘find’ can be at most O(log N), but could be less. Another important point is that if get() fails to compile then the enum is not contiguous (if two values map to the same integer then it’s also not contiguous), which is a very good way to spot accidental errors (e.g. an intern made a change that silently and sneakily breaks assumptions made at runtime). Perhaps static_assert( MyEnum::isContiguous() ) is a better proposition for that end though.

      As for minValue & maxValue usage cases:
      It’s often useful to keep a contiguous array of values; and for that minValue becomes very useful:
      enum MyEnum
      {
      Six = 6,
      Seven,
      Eight
      }

      int refCount[MyEnum::count];
      int add( MyEnum type )
      {
      ++refCount[type - MyEnum::minValue];
      }

      If trying to map to float range [0; 1]; then (myEnum – MyEnum::minValue) / (float)MyEnum::count; is what you need.

      In certain cases an integer may map to different enums:

      enum NegativeNumbers
      {
      NegTwo = -2,
      NegOne = -1,
      };
      enum ZeroNum
      {
      Zero = 0
      };
      enum NaturalNumbers
      {
      One = 1,
      Two = 2,
      };

      int myVal = someValue();
      if( myVal <= NegativeNumbers::maxValue() ) { NegativeNumbers n = (NegativeNumbers)myVal; } else if( myVal == 0 ) { ZeroNum n = (ZeroNum)myVal; } else if( myVal <= NaturalNumbers::maxValue() ) { NaturalNumbers n = (NaturalNumbers)myVal; }

      Obviously these functions lose usefulness for enums that represent other things e.g. bitmasks, but it doesn't mean they should be disregarded.

      I know the standard is working on a "reflection for all things" solution; but not only is it taking forever, it sounds like trying to kill a spider with a shotgun. An airplane is the best transportation device in many senses, that doesn't mean I don't need a car or a bike. I wouldn't use an airplane to buy groceries.
      I can only begin to twist inside by the mere thought of how much the compilation times will be affected by trying to solve these problems with a static reflection system that will be designed to handle from enums to... classes (with public, private and protected attributes, polymorphism, virtual tables), templates (that's a pandora's box) and namespaces.

      As for submitting my own proposal: I work on an open source engine, I regularly report and contribute to other FOSS projects, including Mesa, LLVM and the Linux kernel. I report proprietary driver bugs with sample bugs all the time. On top of that I have client work to do to pay the bills. Now I have to submit proposals to the C++ standard too? That's where I draw the line I guess. There is only so much I can do.
      This is somebody else's job. I know there's a lot of people wanting this for a long time (there's even a Boost-based solutions, and a project "better_enums" riddled with tons of macros FFS). Just googling "iterate through enum" "convert enum to string" yields a ton of results.
      I'm outraged that resources are being devoted to things like integrating a Cairo2D-clone into the standard (I could write an entire blogpost of why that is a horrible idea) where basic things such as improved enum functionality remains unaddressed for +20 years and counting.

      • Manuel Bergler

        Even if you rename the methods, one major problem remains: The functions are in the same scope as the enumerators. Therefore, creating a static `count` function automatically breaks all existing enums that have an enumerator named `count`.

        Regarding perfect hashing: All of the enumerators are known at compile-time, so there is no problem with finding a perfect hash function, even though I wouldn’t expect compilers to guarantee a minimal perfect hash function since that might unnecessarily increase compile times by an unacceptable amount. Nevertheless, the lookup will be O(1), but possibly with a slightly less then optimal memory overhead for the hash table.

        Regarding iteration: It is true that google reveals a lot of search results for C++ iteration of enums, but the few I looked into were StackOverflow questions where the user didn’t provide a rationale *why* it is needed but only asked for a way to get it to work. Even the “Better Enums” library doesn’t have motivating examples in their tutorial. A quick search on the ISO C++ standard proposal reflectors revealed one post from 2013 where someone proposed iteration over enums, but again without a motivating use-case. So unless someone provides a convincing example where iteration over enumerators is truly useful I don’t see a reason to add it to the language as an individual feature.

        There are three distinct typical use-cases for enums:
        1) Single-choice/Exclusive flags, e.g. `enum Doneness { Rare, Medium, WellDone };`
        Here a switch statement without fall-through is sufficient and with a sufficiently high warning
        the compiler even will warn if you missed a case.
        2) Multiple-choice flags aka C-style bitsets, see the example in my previous comment
        Again, why would you need to iterate over these? A particular function only
        needs to check for the flag(s) it is interested in.
        3) Strong type-def for integral types, e.g. your example above where you partition the integers so you can overload functions based on the (expected) range of value
        This is possible since C++11 due to `enum class`/`enum struct`.
        The main issue here again is not iteration – you typically don’t even define enumerators here,
        but the boilerplate of creating all the arithmetic and comparison operations.

        Your example for the use of `minValue` also doesn’t cut it for me: The `refCount` array and the enumeration are tightly coupled, so they should both be members of a class that expresses that coupling. And then you already know the values for the minimal and maximal enumerators and in case of a change to the enum you can change the array alongside it.

        Yes, this is a little bit of boilerplate that the implementor of the class has to take care of, and which wouldn’t be necessary if there were compile-time reflection for enums. But I personally don’t mind the author of a utility class having to write a little bit of boilerplate if that makes the use of it in code that contains domain logic safer and easier to read.

        Which is also why I don’t really mind that there is no generic `to_string` function for enums yet, even though I acknowledge its merits for logging and printf debugging: You just have to write it once for the particular enum using a regular switch statement, and with sufficiently high compiler warning levels you even get the warning when you add a new enumerator but forget to adjust the `to_string` function. It certainly would be nice to have that automated, but it is not of such a high priority that it warrants a separate change to the core language when it eventually can be solved with reflection.

Comments are closed.