I’m still uncertain about the language declaration syntax, where in declarations, syntax is used that mimics the use of the variables being declared. It is one of the things that draws strong criticism, but it has a certain logic to it.
— Dennis M. Ritchie, Creator of C
I consider the C declarator syntax an experiment that failed.
— Bjarne Stroustrup, Creator of C++
Prologue
Often, I come across explanations of C and C++ declarations that try to simplify things for beginners. For example, declarations are often explained (incorrectly) like:
type name ;
However, if you look at the formal grammar for C, nothing like that appears. Instead, you find this:
declaration-specifiers init-declarator-list ;
A declaration-specifier includes a base type (like int
, char
, etc.), optional qualifiers (like const
), and an optional storage class (like static
, extern
, etc.). Specifically, it does not include []
for arrays, ()
for functions, or *
for pointers — those things are part of the declarator.
That’s indisputably more complicated, so I understand the motivation for trying to simplify things for beginners. However, in the long run, the simplification is a disservice since it eventually makes complicated declarations harder for beginners to understand because they have the wrong mental model for declarations. It’s better to explain C declarations as they actually are.
Introduction
As part of designing a programming language, you generally need to design a separate syntax for declaring things (variables, constants, functions, etc.). The advantage of a separate syntax is that it’s usually clear; the (slight) disadvantage is that a separate syntax doesn’t tell you how to use the thing being declared. For example, to declare api
as an array of pointers to integer in Pascal:
api: array[0..4] of ^integer;
which is crystal clear, but to use the variable, you’d write something like:
api[0]^ := 42;
Notice that:
- In the declaration,
api
and[
are not adjacent (whereas in the use they are). - In the declaration,
^
is prefix (whereas in the use it’s postfix).
Pascal was chosen for this example since it was the dominant language used for computer science education in the 1970s when C had only just been recently invented — plus Kernighan famously doesn’t like Pascal.
As the epigraph suggests, Ritchie took a different approach for C. To declare api
as an array of pointers to integer in C, you write the name as if it’s being used in an expression (part of the main syntax for the language), then prepend a base type to the whole thing — the type of the “expression”:
int *api[4]; // array of pointer to integer
That is *api[...]
is how you’d use it to yield an int
. While a bit strange, it does, as Ritchie noted, have a certain logic to it. However, once the declarations get more complicated — and once things like const
and function prototypes were added to C (neither of which existed in the original version of C) — declarations infamously get harder to read.
ANSI C & C++ Complications
By virtually any measure, ANSI C improved upon the original C (often referred to as “K&R C” from the first edition of The C Programming Language). For declarations, the addition of function prototypes, const
, and void
were improvements overall — but they made declarations slightly more complicated in some cases and violate the spirit of Ritchie’s design in others.
Function Prototypes
The addition of function prototypes from C++ was most certainly an improvement overall, but its syntax is inconsistent with non-prototype declarations. Non-prototype declarations allow multiple things to be declared in the same declaration:
int i, j;
int *p, *q;
int k, a[4], *r, f(), *g();
In such declarations, commas are used to separate declarations having the same base type. However, prototype declarations for functions having more than one parameter like:
int lcd( int i, int j );
use commas to separate declarations even when the base type is the same. This means you can’t declare multiple parameters having the same base type specifying the base type only once. Doing so is likely a mistake in C:
double f( double x, y ); // means: double x, int y
The y
is an int
because the base type is missing and a missing base type in C defaults to int
. Fortunately, C compilers warn about this. (In C++, this is an error.)
Personally, I think Stroustrup should have made prototype declarations use the same syntax as non-prototype declarations. For example:
double f( double x, y; int r ); // alternate syntax
That would allow multiple parameters having the same base type to re-use it. Semicolons would be used to separate parameters only when the base type changes. Such a syntax would also have been closer to Ritchie’s original function definition parameter syntax:
double f( x, y, r ) // K&R C function definition
double x, y;
int r;
The only difference would have been to move the declarations inside the ()
.
const
The addition of const
made pointer declarations more complicated because there are two things that can be constant: the value pointed to (pointer to const
), the pointer itself (const
pointer), or both:
const char *pcc; // pointer to const char
char *const cpc; // const pointer to char
const char *const cpcc; // const pointer to const char
Such declarations are also inconsistent in that for the base type, const
is often written to the left of the type (const char
), but for pointers, const
must be written to the right of the *
. To make things more consistent, some people (myself included) prefer right (or “east”) const
so that const
always appears to the right (“east”) of what it’s making constant:
char const *pcc; // equals: const char *pcc
char const *const cpcc; // equals: const char *const cpcc
Read from right-to-left, the second declaration is: cpcc
is a constant pointer to a constant character.
A quirk of C is that array syntax for function parameters is really just syntactic sugar since the compiler rewrites such parameters as pointers:
void f( int p[] ); // int *p
Array syntax for parameters in C is a remnant of how pointers are declared in New B (the precursor to C). See The Development of the C Language, Dennis M. Ritchie, April, 1993.
A consequence of this is that, without the *
being explicit, you’ve lost the place in the declaration where you can put the const
to make p
be constant:
void f( const int p[] ); // const int *p
void g( int const p[] ); // int const *p
In both cases above, it’s the integers that are constant, not the pointer p
. So how do you make p
itself be constant? The ANSI C committee added a bizarre syntax in C99:
void f( int p[const] ); // int *const p
That is, you put the const
inside the []
. (There are several other quirky things that were added in C99.)
void
While the addition of void
allowed pointers to raw, untyped memory, pointer-to-void
declarations violate the spirit of Ritchie’s design of making declarations mimic their use. Consider:
void *p;
The problem is that *p
can never appear in use because it’s illegal to dereference a pointer to void
because void
objects don’t exist.
C++ References
The addition of references in C++ enabled the ability to pass large objects efficiently as function arguments transparently, particularly for operator overloading. However, while reference declarations like:
int i;
int &r = i;
are consistent in the sense that you replace *
for a pointer declaration with &
for a reference declaration, they violate the spirit of Ritchie’s design since &
in expressions does not mean “dereference” but instead means “address of.”
Here be Dragons
You might think a declaration like int *api[4]
isn’t that bad; however, if you want to declare a pointer to an array of integer, you’d have to write:
int (*pai)[4]; // pointer to array of integer
Specifically, you need to add ()
to get the precedence right. The problem stems from the fact that *
is a prefix operator whereas []
is a postfix operator. (If *
were a postfix operator as ^
is in Pascal, this problem wouldn’t exist.)
Declarations can get infinitely more complicated. For example:
char *(*strtab[4])();
where strtab
is an array of pointer to function returning pointer to char
; or even worse:
void (*signal(int sig, void (*f)(int)))(int);
where signal
is a function (sig
as int
, f
as pointer to function (int
) returning void
) returning pointer to function (int
) returning void
.
Fortunately, Ritchie also invented typedef
that can be used to slay such dragons:
typedef char *(*PF_C)();
PF_C strtab[4];
typedef void (*sig_t)(int);
sig_t signal( int sig, sig_t f );
Therefore, declarations generally aren’t that bad in practice.
Additionally, you can use cdecl both to decipher and compose declarations.
“West Pointers”
Despite the reality that C declarations are not:
type name ;
some well-intentioned people try to make things appear to be so by putting the *
in pointer declarations to the left (“west”) of the space:
char* s; // as opposed to: char *s
While such declarations work since the C compiler doesn’t care about whitespace, it also doesn’t care about:
char* s, t; // t is just char
where you likely meant for t
to be char*
also. The same people then tend to say that you shouldn’t declare multiple things in the same declaration anyway and instead do:
char* s;
char* t; // verbose
Personally, I find that needlessly verbose for what otherwise would be trivial declarations.
For an analogy: when learning Spanish, you learn that adjectives go after nouns. Whether you want adjectives to go before to match your English-centric view is irrelevant. You have to speak Spanish the way it is, not the way you’d prefer it to be. So too with C.
Epilogue
C is quirky, flawed, and an enormous success.
— Dennis Ritchie
When teaching C, it’s best in the long run to teach it — warts and all — as it actually is.