Introduction
Even though enumerations already existed in other programming languages in the 1970s (e.g., Pascal), they were an afterthought in C. Indeed, they weren’t part of Dennis Ritchie’s original C implementation (“K&R C”) nor mentioned in the first edition of The C Programming Language (TCPL) in 1978. It wasn’t until 1989 (11 years later!) when the first ANSI C (C89) and the second edition of TCPL were released that enumerations were part of C.
During those intervening 11 years, C got along just fine without enumerations. But, while they may not be essential, they are useful, both for programmers to clearly express their intent to other programmers reading their code, and to the compiler that, in turn, can assist the programmer in writing better (and less buggy) programs. C++ inherited enumerations from C (and, eventually, extended them).
A Brief Refresher
At their simplest, enumerations are a small step up from macros. Rather than doing something like this:
#define COLOR_BLACK 0
#define COLOR_WHITE 1
#define COLOR_BLUE 2
#define COLOR_GREEN 3
#define COLOR_RED 4
you can instead do:
enum color {
COLOR_BLACK,
COLOR_WHITE,
COLOR_BLUE,
COLOR_GREEN,
COLOR_RED, // Extra ',' here is allowed.
};
When declaring an enumeration, the compiler allows you to put a comma after the last constant as a nicety.
You can then use color
as a type and the enumerated constants as values:
enum color c = COLOR_BLACK;
The basic idea of enumerations is that you use them to express a set of related values.
Namespaces & Declarations
Similarly to struct
and union
, enum
types are placed into a separate “tags” namespace, so you must continue to use the enum
prefix. Again similarly, you can alternatively employ typedef
to “import” the enumeration tag into the global namespace:
typedef enum color color;
color c = COLOR_BLUE; // Don't need "enum" now.
However, unlike struct
and union
, forward declarations of enumerations are unfortunately not allowed:
struct node; // OK: forward declaration.
struct node *p; // OK: pointer to node.
enum color; // Error: forward declaration not allowed.
Debugger Advantage
One immediate advantage of enumerations is that debuggers understand them and print their constant names rather than their underlying integer values:
(gdb) p c
$1 = COLOR_BLUE
That’s a lot better than having to look up what color 2
corresponds to if c
were just an int
.
Name Collisions
If you’re unfamiliar with enumerations in C, you might wonder why verbose constant names were used. Why not more simply:
enum color {
BLACK,
WHITE,
BLUE,
GREEN,
RED,
};
Unfortunately, enumeration constants aren’t scoped, which means they’re all “injected” into the global namespace. If you also have another enumeration like:
enum rb_color { // Red-Black tree node color.
BLACK, // Error: redefinition of 'BLACK'.
RED // Error: redefinition of 'RED'.
};
then you’ll get redefinition errors. Hence, it’s a best practice to name all constants of the same enumeration with a common prefix and hope they don’t collide with other names elsewhere.
This problem was fixed in C++11 with scoped enumerations, but that has yet to be back-ported to C, if ever.
Underlying Type
Every enumeration has an underlying type, that is the type that’s actually used at the machine level to represent it. It’s typically int
, but can be any integral type big enough to hold the largest constant value.
Until C23, there’s no way either to know exactly what the underlying type is or to specify it explicitly.
The most you can know is its size via
sizeof
and whether it’s signed by casting-1
to it and seeing if it’s actually < 0, e.g.,(color)-1 < 0
.
However, in C23, you can explicitly specify the underlying type by following the name of the enumeration by a :
and the underlying type such as:
enum color : unsigned char { // C23 and later only.
// ...
};
This is useful if you want to guarantee a size smaller or larger than int
and control its sign in expressions. An underlying type can be any int
or char
type (signed or unsigned) or a typedef
thereof.
Implicit Conversion
Enumeration constants and variables implicitly convert to values of their underlying type in expressions. Additionally, values of the underlying type also implicitly convert to enumerations. While these conversions can sometimes be convenient, they allow nonsensical code to be written with neither errors nor warnings:
color c = COLOR_BLACK + COLOR_WHITE * 2; // ???
Fortunately, there are better uses for implicit conversions — more later.
Values
The values the enumerated constants have are assigned by the compiler (by default) starting at 0 and increasing by 1 for each constant. Often, you don’t particularly care about what those values actually are.
However, you can explicitly specify any values you want to all or only some constants. You can even specify negative values (unless you specified an unsigned
underlying type). If omitted, the value of a constant is assigned by the compiler as the previous value plus one:
enum color {
COLOR_NONE = -1,
COLOR_BLACK = 0,
COLOR_WHITE = 1,
COLOR_BLUE, // Value is 2 ...
COLOR_GREEN, // ... 3 ...
COLOR_RED, // ... 4 ...
};
That said, you should not explicitly specify values unless one of these is true:
- The values are either “externally imposed” or otherwise have meaning; or:
- You need to “serialize” the values (either on disk or “over the wire”); or:
- You are representing bit flags.
Externally Imposed Values
An example of externally imposed values would be if you were writing software for a graphics terminal where the hardware uses specific values for specific colors:
enum ansi_color {
ANSI_BLACK = 40,
ANSI_WHITE = 47,
ANSI_BLUE = 44,
ANSI_GREEN = 42,
ANSI_RED = 41
};
Due to implicit conversion to integer, you can use the values directly:
printf( "\33[%dm", ANSI_RED ); // Will print in red.
Serializing Values
If you write values to disk (presumably to read them back at some later time), you want to ensure that, say, 3
will always correspond to COLOR_GREEN
even if you add more colors. If the values weren’t explicitly specified and you added a new color anywhere but at the end, the subsequent values would silently shift by 1:
enum color {
COLOR_BLACK,
COLOR_WHITE,
COLOR_YELLOW, // New color is now 2.
COLOR_BLUE, // This used to be 2, but is now 3 ...
COLOR_GREEN, // ... and so on.
COLOR_RED
};
Of course you could have the policy always to add new values at the end, but that relies on programmers following the policy. If you specify values explicitly, the compiler can help you enforce unique values, but not in the way you might assume — more later.
Alternatively, you can serialize values as strings:
void write_color( color c, FILE *f ) {
switch ( c ) {
case COLOR_BLACK: fputs( "black", f ); return;
case COLOR_WHITE: fputs( "white", f ); return;
case COLOR_BLUE : fputs( "blue" , f ); return;
case COLOR_GREEN: fputs( "green", f ); return;
case COLOR_RED : fputs( "red" , f ); return;
}
UNEXPECTED_VALUE( c );
}
While serializing to text is more expensive, if you’re serializing the rest of your data to a text format like JSON anyway, then it doesn’t matter. The other advantage is that changes to the underlying values don’t matter.
The UNEXPECTED_VALUE( c )
is a macro like:
#define UNEXPECTED_VALUE(EXPR) do { \
fprintf( stderr, \
"%s:%d: %lld: unexpected value for " #EXPR "\n", \
__FILE__, __LINE__, (long long)(EXPR) \
); \
abort(); \
} while (0)
You can use it (or something like it) if you want to program defensively.
Serializing Values with X Macros
Having to remember to add a case
to the switch
whenever you add a new value is error-prone. Instead, you can use X macros. To do so, first declare a macro containing all of the colors:
#define COLOR_ENUM(X) \
X(COLOR_BLACK) \
X(COLOR_WHITE) \
X(COLOR_BLUE) \
X(COLOR_GREEN) \
X(COLOR_RED)
where each value is given as an argument to an as-of-yet undefined macro named X
. Next, declare a macro that, given a value, declares it inside an enum
declaration:
#define COLOR_ENUM_DECLARE(V) V,
(The trailing comma will separate values.) Finally, to declare the enum
itself:
enum color {
COLOR_ENUM( COLOR_ENUM_DECLARE )
};
The trick with X macros is that you pass the name of some other macro for X
(here, COLOR_ENUM_DECLARE
) that COLOR_ENUM
will expand for each value. In this case, the end result of all the macro expansion is a comma-separated list of values.
Well, that’s nice; but how does that help with serializing values? The trick is to pass a different macro for X
, one that will generate a case
and string literal for each value inside a switch
:
#define COLOR_ENUM_STRING(V) case V: return #V;
char const* color_string( color c ) {
switch ( c ) {
COLOR_ENUM( COLOR_ENUM_STRING )
}
UNEXPECTED_VALUE( c );
}
void write_color( color c, FILE *f ) {
fputs( color_string( c ), f );
}
For a given value V
, #V
will cause the preprocessor to “stringify” it, e.g., COLOR_RED
will become "COLOR_RED"
. If you can live with the serialized names exactly matching the values ("COLOR_RED"
vs. "red"
), X macros are a great technique.
Duplicate Values
It’s perfectly legal to have two constants of the same enumeration with the same underlying value:
enum color {
// ...
COLOR_GREEN,
COLOR_CHARTREUSE = COLOR_GREEN,
// ...
};
They’re synonyms. In this case, it’s clearly intentional. However, it’s possible to introduce synonyms by accident, especially in an enumeration with lots of explicitly supplied values. Since synonyms are legal, the compiler can’t help you detect accidental synonyms — until you switch
on them:
switch ( c ) {
// ...
case COLOR_GREEN:
// ...
break;
case COLOR_CHARTREUSE: // Error: duplicate case value.
// ...
“None” Values
If an enumeration can have a “default,” “OK,” “none,” “not set,” “unspecified,” or similar value, it should be declared first so:
- It will get assigned the value of 0 by default by the compiler which is easily recognizable in a debugger.
- Global or file-scope
static
enumeration variables will be initialized to it (0) automatically.
For example:
enum eol {
EOL_UNSPECIFIED,
EOL_UNIX,
EOL_WINDOWS
};
Checking Values
If you need to check the value of an enumeration variable for one particular value, using an if
is fine:
if ( eol == EOL_UNSPECIFIED )
return;
However, if you need to check more than one value, you should always use a switch
:
switch ( eol ) {
case EOL_UNSPECIFIED: // Default to Unix-style.
case EOL_UNIX:
putchar( '\n' );
break;
case EOL_WINDOWS:
fputs( "\r\n", stdout );
break;
}
Why? Because if you omit a case
for a constant, the compiler will warn you. This is extremely helpful if you add a new enumeration constant: the compiler can tell you where you forgot to add a case
to your switch
statements.
However, you should avoid using default
when switching on enumerations because it prevents the compiler from being able to warn you when you omit a case
for a constant. It’s better to include a case
for every constant even if those cases do nothing:
switch ( ast->array.kind ) {
case C_ARRAY_INT_SIZE:
dup_ast->array.size_int = ast->array.size_int;
break;
case C_ARRAY_NAMED_SIZE:
dup_ast->array.size_name = strdup( ast->array.size_name );
break;
case C_ARRAY_EMPTY_SIZE: // Don't use "default" here.
case C_ARRAY_VLA_STAR:
// nothing to do
break;
}
“Count” Values
You may encounter code that adds a “count” constant at the end:
enum color {
COLOR_BLACK,
COLOR_WHITE,
COLOR_BLUE,
COLOR_GREEN,
COLOR_RED,
NUM_COLOR // Equal to number of colors (here, 5).
};
The intent is that the underlying value of NUM_COLOR
will be the number of colors since the compiler will automatically assign 5 to it in this case which the number of actual colors. This is then used to marginally simplify serialization to text by using the underlying value as an index into an array (assuming the value of the first constant is 0):
void write_color( color c, FILE *f ) {
static char const *const COLOR_NAME[] = {
"black",
"white"
"blue",
"green",
"red"
};
if ( c >= NUM_COLOR ) // Defensive check.
UNEXPECTED_VALUE( c );
fputs( COLOR_NAME[ c ], f );
}
The caveat to this is that it adds a “pseudo color” value that you’d need to include as a case
in every switch
on color
to prevent the compiler warning about the unhandled case
even though the value will never match. It’s for this reason that I don’t recommend adding “count” constants to enumerations.
“Count” Values with X Macros
The X macros technique can be used to count the number of values:
#define PLUS_ONE(...) +1
enum { NUM_COLOR = COLOR_ENUM( PLUS_ONE ) };
where COLOR_ENUM(PLUS_ONE)
will expand to +1
for each value or +1+1+1+1+1
or 5
. As with all X macros, PLUS_ONE()
must take an argument (here, the enumeration value), but doesn’t need it, hence the use of ...
to indicate “don’t care.”
The use of an unnamed enum
for NUM_COLOR
forces the compiler to evaluate +1+1+1+1+1
at compile-time yielding a constant.
Bit Flag Values
Another way to use enumerations is to declare a set of bitwise flags where each constant is a unique power of 2:
enum c_int_fmt {
CIF_NONE = 0,
CIF_SHORT = 1 << 0,
CIF_INT = 1 << 1,
CIF_LONG = 1 << 2,
CIF_UNSIGNED = 1 << 3,
CIF_CONST = 1 << 4,
CIF_STATIC = 1 << 5,
};
Rather than specify power-of-2 values explicitly, e.g., 0, 1, 2, 4, 8, etc., a common practice is to use
1 << N
where N is the Nth bit from 0 to however many bits are needed and let the compiler do the calculation for you.
You can then bitwise-or various flags together:
enum c_int_fmt f = CIF_CONST | CIF_UNSIGNED | CIF_INT;
This results in a value (0b0011010
) that isn’t among the declared constants — but that’s perfectly legal. Debuggers are even smart enough to notice this and print accordingly:
(gdb) p f
$1 = CIF_INT | CIF_UNSIGNED | CIF_CONST
You can also test for inclusion of particular bits:
if ( (f & CIF_STATIC) != CIF_NONE )
puts( "static " );
if ( (f & CIF_CONST) != CIF_NONE )
puts( "const " );
if ( (f & CIF_UNSIGNED) != CIF_NONE )
puts( "unsigned " );
if ( (f & CIF_SHORT) != CIF_NONE )
puts( "short " );
if ( (f & CIF_LONG) != CIF_NONE )
puts( "long " );
if ( (f & CIF_INT) != CIF_NONE )
puts( "int " );
Or test for sets of bits, for example, does f
have two specific bits set:
if ( (f & (CIF_SHORT | CIF_LONG)) == CIF_SHORT | CIF_LONG )
goto illegal_format;
The caveat, of course, is that a switch
on such an enumeration may not match any case
. Despite that, enumerations are often used for bitwise flags.
This is a case where implicit conversion to int
is convenient because the bitwise operators just work.
Conclusion
Remember these best practices for using enumerations:
- Use them only to specify a set of related values to make your intent clear both to other programmers and the compiler.
- Use “none” constants when appropriate.
- Do not explicitly specify values unless necessary.
- Do not specify “count” values.
- Use
switch
when comparing against more than one value. - Avoid using
default
inswitch
statements on enumerations whenever possible. - Use bit flag values when appropriate.
As mentioned, C++ inherited enumerations from C, but also extended and fixed some of their problems, but that’s a story for another time.