Good CLI Design & Implementation

Paul J. Lucas - May 9 '23 - - Dev Community

Introduction

Graphical User Interfaces (GUIs) or REST APIs get all the attention, but, for command-line tools, Command-Line Interfaces (CLIs) are just as important, but often neglected. Hence, this article is going to show how to design and implement a good CLI in C.

Typically at a bare minimum, a CLI has to:

  1. Parse and validate command-line options and their arguments (if any).
  2. Parse and validate command-line arguments (if any).
  3. Give good error messages.
  4. Give good help.

There are actually several styles of command-line options. Of these, I personally recommend using the GNU standard due to the ubiquity of GNU command-line tools. To parse GNU-style command-line options, use getopt_long() declared in getopt.h. Unfortunately, getopt_long() has a number of quirks. I’ll mention these and show how to work around them.

Command-Line Options

The getopt_long() function requires lists of both long and short options. For example:

static struct option const OPTIONS_LONG[] = {
  { "help",    no_argument,       NULL, 'h' },
  { "output",  required_argument, NULL, 'o' },
  { "version", no_argument,       NULL, 'v' },
  { NULL,      0,                 NULL, 0   }
};
static char const OPTIONS_SHORT[] = ":ho:v";
Enter fullscreen mode Exit fullscreen mode

For OPTIONS_LONG, I find the flag (third) field vestigial since it’s useful only for int (or int-as-bool) option values, so I recommend always setting it to NULL. In that case, the val (fourth) field is what is returned by getopt_long() when a given long option has been encountered. It’s most straight forward to make val the short option synonym since every long option should have a short option synonym whenever possible.

In the rare case when no short option synonym is possible for a particular long option (because the desired short option is already being used as a synonym for a different long option), then you can specify a non-null flag to know when the particular long option has been encountered. In such cases, getopt_long() returns 0.

According to the GNU standard, all programs should accept the --help and --version options.

OPTIONS_SHORT specifies the short options in a way that’s backwards compatible with the POSIX getopt(). Of note:

  • A leading : makes getopt_long() return a : when a required argument for an option is missing.
  • An option letter followed by : means that option requires an argument. (A :: means that option allows an optional argument.)

However, if every long option has a short option synonym, then having to specify OPTIONS_SHORT separately is both redundant (since all option information is contained in OPTIONS_LONG) and error-prone (since you might update OPTIONS_LONG but forget to update OPTIONS_SHORT to match). To address both of these issues, we can write a function to create a short option string from a long option array:

static char const* make_short_opts( struct option const opts[static const 2] ) {
  // pre-flight to calculate string length
  size_t len = 1;       // for leading ':'
  for ( struct option const *opt = opts; opt->name != NULL; ++opt )
    len += 1 + (unsigned)opt->has_arg;

  char *const short_opts = malloc( len + 1/*\0*/ );
  char *s = short_opts;

  *s++ = ':';           // return missing argument as ':'
  for ( struct option const *opt = opts; opt->name != NULL; ++opt ) {
    *s++ = (char)opt->val;
    switch ( opt->has_arg ) {
      case optional_argument:
        *s++ = ':';
        // no break;
      case required_argument:
        *s++ = ':';
    } // switch
  } // for
  *s = '\0';

  return short_opts;
}
Enter fullscreen mode Exit fullscreen mode

If you don’t know what the static const 2 in the declaration of the opts parameter means, read this.

For this simple example, we only need to declare one global variable for the output option:

char const *opt_output = "-";
Enter fullscreen mode Exit fullscreen mode

(The --help and --version options can be handled internally.)

Parsing Command-Line Options

Here is the start of a parse_options() function:

static void parse_options( int *pargc, char const **pargv[const] ) {
  opterr = 0;  // suppress default error message

  int  opt;
  bool opt_help = false;
  bool opt_version = false;
  char const *const options_short = make_short_opts( OPTIONS_LONG );
  // ...
Enter fullscreen mode Exit fullscreen mode

The function takes pointers to argc and argv because we want to adjust argc to be the number of non-option command-line arguments and adjust argv such that argv[0] points at the first non-option argument (if any).

First, we set the global opterr = 0 to suppress default error messages given by getopt_long() so we can print error messages in exactly the format we want.

Next, we declare a few variables including opt_help and opt_version. These option variables are declared locally because we can handle those entirely within parse_options(), so there’s no need to make them global.

Next, we call getopt_long() in a loop until it returns one of -1 (for “no more options”), : (for a missing required argument), or ? (for an invalid option):

  // ...
  for (;;) {
    opt = getopt_long(
      pargc, pargv, options_short, OPTIONS_LONG, /*longindex=*/NULL
    );
    if ( opt == -1 )
      break;
    switch ( opt ) {
      case 'h':
        opt_help = true;
        break;
      case 'o':
        if ( SKIP_WS( optarg )[0] == '\0' )
          goto missing_arg;
        opt_output = optarg;
        break;
      case 'v':
        opt_version = true;
        break;
      case ':':
        goto missing_arg;
      case '?':
        goto invalid_opt;
    } // switch
  } // for
  // ...
Enter fullscreen mode Exit fullscreen mode

For options that take arguments, the argument value it stored in a global variable optarg by getopt_long(). You must copy (shallow is fine) it to your option variable since its value will change on every loop iteration. However, getopt_long() considers options like the following:

example --output=     # optarg will be "" (empty string)
example --output=" "
Enter fullscreen mode Exit fullscreen mode

to have a present — but either an empty or all-whitespace — argument. In most cases, we want to treat this the same as a missing argument. SKIP_WS() is a macro that skips any leading whitespace in a string:

#define SKIP_WS(S)  ((S) += strspn( (S), " \n\t\r\f\v" ))
Enter fullscreen mode Exit fullscreen mode

Once skipped, the first character of (the updated) optarg can be checked: if it’s the null character, the argument is effectively missing.

Note that we could handle the --help and --version options “inline” in their respective cases. However, we don’t because all options should be parsed first, then handled. If such options were handled “inline,” then we wouldn’t catch usage errors like:

example --version arg
Enter fullscreen mode Exit fullscreen mode

(The --help and --version options may only be given by themselves. We check for this later.)

After all options have been parsed, we can free options_short and adjust argc and argv by optind (a global variable maintained by getopts_long() that contains the number of options parsed):

  // ...
  free( (void*)options_short );
  *pargc -= optind;
  *pargv += optind;
  // ...
Enter fullscreen mode Exit fullscreen mode

The --help and --version Options

Next, we handle the --help and --version options:

  // ...
  if ( opt_help )
    usage( *pargc > 0 ? EX_USAGE : EX_OK );
  if ( opt_version ) {
    if ( *pargc > 0 )
      usage( EX_USAGE );
    version();
  }
  return;
  // ...
Enter fullscreen mode Exit fullscreen mode

The EX_ symbols are preferred exit status codes declared in sysexits.h. You should use those whenever possible.

Printing the Usage Message

One problem with the option struct is that there’s no member for a description. Instead, we can store the description in another array:

static char const *const OPTIONS_HELP[] = {
  [ 'h' ] = "Print help and exit",
  [ 'o' ] = "Write to file [default: stdout]",
  [ 'v' ] = "Print version and exit",
};
Enter fullscreen mode Exit fullscreen mode

It’s an array of char const* (strings) indexed by short option characters (one pointer for each ASCII character) initialized via the array designator syntax.

It’s a tiny bit wasteful due to the NULL “holes” in the array, but, in the grand scheme of things, it’s nothing.

Given that, we can write a usage() function that prints the command-line usage message by iterating over the OPTIONS_LONG array and looking up each option’s help in OPTIONS_HELP. But first, iterate over OPTIONS_LONG to find the longest option’s length so we can make everything line up:

_Noreturn static void usage( int status ) {
  // pre-flight to calculate longest long option length
  size_t longest_opt_len = 0;
  for ( struct option const *opt = OPTIONS_LONG;
        opt->name != NULL; ++opt ) {
    size_t opt_len = strlen( opt->name );
    switch ( opt->has_arg ) {
      case no_argument:
        break;
      case optional_argument:
        opt_len += STRLITLEN( "[=ARG]" );
        break;
      case required_argument:
        opt_len += STRLITLEN( "=ARG" );
        break;
    } // switch
    if ( opt_len > longest_opt_len )
      longest_opt_len = opt_len;
  } // for

  FILE *const fout = status == EX_OK ? stdout : stderr;
  fprintf( fout, "usage: %s [options] ...\noptions:\n", prog_name );

  for ( struct option const *opt = OPTIONS_LONG;
        opt->name != NULL; ++opt ) {
    fprintf( fout, "  --%s", opt->name );
    size_t opt_len = strlen( opt->name );
    switch ( opt->has_arg ) {
      case no_argument:
        break;
      case optional_argument:
        opt_len += (size_t)fprintf( fout, "[=ARG]" );
        break;
      case required_argument:
        opt_len += (size_t)fprintf( fout, "=ARG" );
        break;
    } // switch
    assert( opt_len <= longest_opt_len );
    fprintf( fout,
      "%*s (-%c) %s.\n",
      (int)(longest_opt_len - opt_len), "",
      opt->val, OPTIONS_HELP[ opt->val ]
    );
  } // for

  exit( status );
}
Enter fullscreen mode Exit fullscreen mode

For the definition of STRLITLEN(), see here.

The global variable prog_name contains the program’s name we’ll set in main().

The usage() function takes an exit status for two reasons:

  1. If the usage message is being printed by request via --help, then it should print to standard output (because no error has occurred). However, if it’s being printed because of a usage error, then it should print to standard error.
  2. So it can call exit() with that status (that we might as well do since it was passed for the first reason).

The final fprintf():

    fprintf( fout,
      "%*s (-%c) %s.\n",
      (int)(longest_opt_len - opt_len), "",
      opt->val, OPTIONS_HELP[ opt->val ]
    );
Enter fullscreen mode Exit fullscreen mode

uses the %*s formatting directive that means: print a string (s) in a field whose width (*) is given by the next int argument. In this case, that int argument is the difference in length between the longest option length and the current option length. Printing nothing ("") will print that difference as the number of spaces we need to to line up the remaining output.

The call:

    usage( *pargc > 0 ? EX_USAGE : EX_OK );
Enter fullscreen mode Exit fullscreen mode

checks to see whether there are any command-line arguments: if so, it’s a usage error since the --help option may only be given by itself.

The version() function simply prints the program name and version, then exits:

_Noreturn static void version( void ) {
  puts( PACKAGE_NAME " " PACKAGE_VERSION );
  exit( EX_OK );
}
Enter fullscreen mode Exit fullscreen mode

where PACKAGE_NAME and PACKAGE_VERSION are defined elsewhere, something like:

#define PACKAGE_NAME     "example"
#define PACKAGE_VERSION  "1.0"
Enter fullscreen mode Exit fullscreen mode

But before we call version(), we do the same check to see whether there any command-line arguments:

    if ( *pargc > 0 )
      usage( EX_USAGE );
    version();
Enter fullscreen mode Exit fullscreen mode

If so, it’s a usage error.

Invalid Options

For invalid options:

  // ...
invalid_opt:
  (void)0;  // needed before C23
  char const *invalid_opt = (*pargv)[ optind - 1 ];
  if ( invalid_opt != NULL && strncmp( invalid_opt, "--", 2 ) == 0 )
    fprintf( stderr, "\"%s\": invalid option", invalid_opt + 2 );
  else
    fprintf( stderr, "'%c' invalid option", (char)optopt );
  fputs( "; use --help or -h for help\n", stderr );
Enter fullscreen mode Exit fullscreen mode

Unfortunately, getopt_long()’s error-handling is poor. When getopt_long() returns ? to indicate an invalid option, we have to determine whether it was an invalid short or long option:

  • If it was an invalid short option, getopt_long() will set the global variable optopt to it.
  • However, if it was an invalid long option, getopt_long() doesn’t directly tell you what that long option was.

We have to inspect (*pargv)[optind-1], the command-line argument it was processing at the time: if it starts with --, it’s the invalid long option; otherwise optopt is the invalid short option.

Missing Required Arguments

For options with required but missing arguments, we print an error message:

missing_arg:
  fatal_error( EX_USAGE,
    "\"%s\" requires an argument\n",
    opt_format( (char)(opt == ':' ? optopt : opt) )
  );
} // end of parse_options()
Enter fullscreen mode Exit fullscreen mode

However, this code is executed in two cases:

  1. getopt_long() returned : to indicate a required argument was missing. In this case, optopt contains the option missing its argument.
  2. getopt_long() returned the option and its argument, but, upon further checking, we discovered that the argument was either the empty string or all whitespace. In this case, opt contains the option having said argument.

The function fatal_error() is a convenience variadic function that prints and error message (preceded by the program’s name) and exits with the given status code:

_Noreturn void fatal_error( int status, char const *format, ... ) {
  fprintf( stderr, "%s: ", prog_name );
  va_list args;
  va_start( args, format );
  vfprintf( stderr, format, args );
  va_end( args );
  exit( status );
}
Enter fullscreen mode Exit fullscreen mode

The function opt_format() formats an option in both its long (if it exists) and short form, e.g. --help/-h, for use in an error message:

#define OPT_BUF_SIZE  32  /* enough for longest long option */

char const* opt_format( char short_opt ) {
  static char bufs[ 2 ][ OPT_BUF_SIZE ];
  static unsigned buf_index;
  char *const buf = bufs[ buf_index++ % 2 ];

  char const *const long_opt = opt_get_long( short_opt );
  snprintf(
    buf, OPT_BUF_SIZE, "%s%s%s-%c",
    long_opt[0] != '\0' ? "--" : "", long_opt,
    long_opt[0] != '\0' ? "/"  : "", short_opt
  );
  return buf;
}
Enter fullscreen mode Exit fullscreen mode

The function uses two internal buffers so that opt_format() can be called twice in the same printf(). (This will become handy later.)

The function opt_get_long(), given a short option, gets its corresponding long option, if any:

static char const* opt_get_long( char short_opt ) {
  for ( struct option const *opt = OPTIONS_LONG; opt->name != NULL; ++opt ) {
    if ( opt->val == short_opt )
      return opt->name;
  } // for
  return "";
}
Enter fullscreen mode Exit fullscreen mode

Calling parse_options()

Finally, this is how parse_options() would be called:

char const *prog_name;

int main( int argc, char const *argv[] ) {
  prog_name = argv[0];
  parse_options( &argc, &argv );
  // ...
}
Enter fullscreen mode Exit fullscreen mode

After parse_options() returns, argc will contain the number of remaining non-option arguments and, if any, argv[0] will be the first such option. (Note that this differs from the canonical value of argv[0] that is initially the executable’s path.)

Option Exclusivity

The code presented so far doesn’t handle the case where certain options may be given only by themselves (e.g., you shouldn’t be allowed to give --help and --version with any other option). That can be implemented by adding a global array to keep track of which options have been given:

static _Bool opts_given[128];  // options that were given
Enter fullscreen mode Exit fullscreen mode

setting it for each option returned by getopt_long():

      // ...
      case '?':
        goto invalid_opt;
    } // switch
    opts_given[ opt ] = true;  // <-- new line
  } // for
Enter fullscreen mode Exit fullscreen mode

writing a function to check for exclusivity:

static void opt_check_exclusive( char opt ) {
  if ( !opts_given[ (unsigned)opt ] )
    return;
  for ( size_t i = '0'; i < ARRAY_SIZE( opts_given ); ++i ) {
    char const curr_opt = (char)i;
    if ( curr_opt == opt )
      continue;
    if ( opts_given[ (unsigned)curr_opt ] ) {
      fatal_error( EX_USAGE,
        "%s can be given only by itself\n",
        opt_format( opt )
      );
    }
  } // for
}
Enter fullscreen mode Exit fullscreen mode

and calling the function after processing all options:

  // ...
  *pargc -= optind;
  *pargv += optind;

  opt_check_exclusive( 'h' );
  opt_check_exclusive( 'v' );
  // ...
Enter fullscreen mode Exit fullscreen mode

Option Mutual Exclusivity

In many programs, there are some options that may not be given with some other options. For example, if a program has options --json/-j and --xml/-x to specify output formats, those options can’t be given simultaneously. It’s good to check for such cases rather than letting the last option specified “win.” A function to check for mutual exclusivity is:

static void opt_check_mutually_exclusive( char opt, char const *opts ) {
  if ( !opts_given[ (unsigned)opt ] )
    return;
  for ( ; *opts != '\0'; ++opts ) {
    assert( *opts != opt );
    if ( opts_given[ (unsigned)*opts ] ) {
      fatal_error( EX_USAGE,
        "%s and %s are mutually exclusive\n",
        opt_format( opt ),
        opt_format( *opts )
      );
    }
  } // for
}
Enter fullscreen mode Exit fullscreen mode

where opt is the short option that, if given, then none of the short options in opts can also be given. (This is the aforementioned case when opt_format() using two internal buffers becomes handy since it can be called twice in the same statement as it is here.)

Calling the function would be like:

  opt_check_mutually_exclusive( 'j', "x" );
Enter fullscreen mode Exit fullscreen mode

Other Option Checks

Of course it’s possible for some programs to have more complicated option relationships, e.g., if -x is given, then -y must be also. If your program has such relationships, you should check for them. Writing such a function using opts_given is fairly straightforward, but left as an exercise for the reader.

Eliminating Short Option Redundancy

Every short option has to be specified four times:

  1. In a val field of a struct option array.
  2. In a case.
  3. In calls to either opt_check_exclusive() or opt_check_mutually_exclusive().
  4. In the OPTIONS_HELP array.

If you ever decide to change a short option, you have to update it in four places. It would be better to define every short option once, then use that definition everywhere. For this sample program, we can do:

#define OPT_HELP     h
#define OPT_JSON     j
#define OPT_OUTPUT   o
#define OPT_VERSION  v
#define OPT_XML      x
Enter fullscreen mode Exit fullscreen mode

But, in order to use these definitions, they have to be either “stringified” or “charified” depending on the use. Stringification is easier since the C preprocessor supports it directly:

#define STRINGIFY_HELPER(X)  #X
#define STRINGIFY(X)         STRINGIFY_HELPER(X)

#define SOPT(X)              STRINGIFY(OPT_##X)
Enter fullscreen mode Exit fullscreen mode

SOPT(FOO) means “stringify the FOO option.” For example, SOPT(HELP) will expand into "h".

However, for the val field, in a case, and in OPTIONS_HELP, we need short options as characters. Unfortunately, the C preprocessor doesn’t support “charification” — not directly, anyway. It’s possible to implement with the caveat that only characters that are valid in identifiers can be charified, that is [A-Za-z_0-9]. To start, define one macro per identifier character:

#define CHARIFY_0 '0'
#define CHARIFY_1 '1'
#define CHARIFY_2 '2'
// ...
#define CHARIFY_A 'A'
#define CHARIFY_B 'B'
#define CHARIFY_C 'C'
// ...
#define CHARIFY__ '_'
#define CHARIFY_a 'a'
#define CHARIFY_b 'b'
#define CHARIFY_c 'c'
// ...
#define CHARIFY_z 'z'
Enter fullscreen mode Exit fullscreen mode

Then:

#define NAME2_HELPER(A,B)    A##B
#define NAME2(A,B)           NAME2_HELPER(A,B)

#define CHARIFY(X)           NAME2(CHARIFY_,X)

#define COPT(X)              CHARIFY(OPT_##X)
Enter fullscreen mode Exit fullscreen mode

COPT(FOO) means “charify the FOO option.” For example, COPT(HELP) will expand into 'h'. This can then be used in the option array:

  // ...
  { "help",    no_argument,       NULL, COPT(HELP)    },
  { "output",  required_argument, NULL, COPT(OUTPUT)  },
  { "version", no_argument,       NULL, COPT(VERSION) },
  // ...
Enter fullscreen mode Exit fullscreen mode

and in OPTIONS_HELP:

static char const *const OPTIONS_HELP[] = {
  [ COPT(HELP) ] = "Print help and exit",
  [ COPT(OUTPUT) ] = "Write to file [default: stdout]",
  [ COPT(VERSION) ] = "Print version and exit",
};
Enter fullscreen mode Exit fullscreen mode

and in cases:

      // ...
      case COPT(HELP):
        opt_help = true;
        break;
      // ...
Enter fullscreen mode Exit fullscreen mode

and in calls to opt_check_exclusive():

  opt_check_exclusive( COPT(HELP) );
  opt_check_exclusive( COPT(VERSION) );
Enter fullscreen mode Exit fullscreen mode

and in calls to opt_check_mutually_exclusive():

  opt_check_mutually_exclusive( COPT(JSON), SOPT(XML) );
Enter fullscreen mode Exit fullscreen mode

As a bonus, it makes the code a lot more readable.

Conclusion

CLIs should be just as robust as REST APIs. Ultimately, good CLIs make for a better user experience and can prevent unanticipated option combinations that can lead to bugs.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .