Parsing options in bash

2017-11-12

So, you're writing a shell script and you've come to realize that it needs to be able to perform several different functions. You could separate each of those into a separate script... but they're closely tied together; they complement each other. It only makes sense to make them available via the same command, just activated using different options.

Doing it manually

Obviously, the simplest option is employing a while loop, a case statement (or, alternatively, some ifs) and doing it by yourself.

#!/bin/bash
 
OPTION_A=0
OPTION_B=""
OPTION_THREE=""
 
while [ $# -gt 0 ]; do
    case "$1" in
        -a)
            OPTION_A=1
        ;;
 
        -b)
            shift
 
            if [ $# -gt 0 ]; then
                OPTION_B=$1
            else
                echo "Option -b requires an argument"
                exit
            fi
        ;;
 
        --double)
            OPTION_DOUBLE=1
        ;;
 
        -*)
            if [ "$1" == "--" ]; then
                shift
                break
            else
                echo "Unknown option '$1'"
                exit
            fi
        ;;
 
        *)
            break
        ;;
    esac
 
    shift
done

While relatively simple, this approach has a couple of drawbacks:

If you have a lot of options, the amount of code balloons up quite quick.
Checking for option arguments requires even more code, and there's a risk you'll make a mistake somewhere.
Did you remember about supporting -- as an "end of options" marker?

Employing getopts (bash built-in)

Parsing options is a relatively common problems, and as it usually happens with common problems – there's a library for that. If you're using a POSIX-conformant shell, you can use the shell builtin getopts for option parsing.

#!/bin/bash
 
while getopts "ab:d" OPTNAME; do
    case "$OPTNAME" in
        a)
            OPTION_A=1
        ;;
 
        b)
            OPTION_B=$OPTARG
        ;;
 
        d)
            OPTION_DOUBLE=1
        ;;
 
        \?)
            exit
        ;;
    esac
done

Much cleaner! getopts will go through the script arguments, parsing options and their arguments (values). The exit status is zero while parsing options, and non-zero when a non-option is encountered, which makes it rather natural to stick the whole thing into a while loop.

Okay, but how do we actually specify the supported options? This done via the first argument to getopts, which is usually called an option string. It is basically a list of possible single-letter options, each of them optionally followed by a colon (:), which serves as a "this option expects an arguments" marker.

When getopts processes the script options, it puts the current option index in the OPTIND variable, and the option argument (if it takes one) in the OPTARG variable. The variable which stores the actual option name can be controlled by the user and is the second argument to getopts; in the example above, I use OPTNAME, as it's a descriptive name that also fits nicely with the other two variables.

Error handling with getopts

What about error handling, you may ask? getopts does that for you, too! If an unknown option is encountered, getopts will print an error message, and the selected variable (OPTNAME in the example above) will be set to ? (a question mark character).

If you want more control over error handling, you may prepend your option string with a colon (so in the example above, it would become :ab:d).

#!/bin/bash
 
while getopts ":ab:d" OPTNAME; do
    case "$OPTNAME" in
        a)
            OPTION_A=1
        ;;
 
        b)
            OPTION_B=$OPTARG
        ;;
 
        d)
            OPTION_DOUBLE=1
        ;;
 
        :)
            echo "Option '-$OPTARG' requires an argument, ya dingus"
            exit
        ;;
 
        \?)
            echo "I don't know what '-$OPTARG' is!"
            exit
        ;;
    esac
done

When you do this, the getopts behaviour changes in a few ways.

First, it won't automatically print any error messages.
Second, upon encountering an unknown option, OPTNAME will be set to ? (as in the standard scenario), and the unknown option will be put into the OPTARG variable.
Third, when an option requiring an argument is missing said argument, OPTNAME will be set to : (a colon) and said option will be put into the OPTARG variable.

Employing getopt (standalone binary)

One downside of getopts that was hinted by the examples above is, unfortunately, the lack of support for --long options. If we need to support these, we can use the separate getopt program.

getopt --name 'mytestscript' --options 'ab:' --longoptions 'double' -- "$@"

Hmm... This doesn't really look like option parsing, now does it? So what does getopt do, really? Basically, it performs three functions for us:

Error handling: like the shell builtin, it will print error messages when an error is encountered: an unknown option, or an option missing a parameter. You can also control this behaviour with the prepend-with-colon function; should you do that, errors will be silently swallowed.
Shuffling the arguments: while POSIX mandates that options may not follow non-options, many implementations of getopt(3) (the libc function) allow for mingling the two (so you can do something like chmod u+x -R directory/). The most common example of this is glibc (the GNU C Library), commonly found on Linux. getopt(1) inherits this behaviour. If you don't want this, you can prepend the option string with + (a plus sign). Alternatively, you can set the environment variable POSIXLY_CORRECT, although this has the downside of altering the behaviour of many other programs.
Marking the end of options: the output of getopt will always contain a -- to tell us where the options end.

Okay, but that still doesn't answer the question: what does getopt actually DO? It's a separate program, so it can't set any variables inside our shell. As hinted above, getopt outputs a reformatted version of the argument list on stdin. For an example:

user $ ./mytestscript nonoption -b argument --double 'non option with spaces'
 -b 'argument' --double -- 'nonoption1' 'non option with spaces'

Where do we go from here? Well, we need to somehow put the output from getopt into our positional parameters ($1 and so on). To do this, we can use the set builtin.

There's one problem, though: getopt, by default, quotes the encountered non-options and arguments, and if we pass the output as-is to set, said quotes will make it to our parameters. We can work around this by using eval, which will cause the shell to properly process the quotes first.

OPTIONS=`getopt --name 'mytestscript' --options 'ab:' --longoptions 'double' -- "$@"`
eval set -- "$OPTIONS"

Now that our positional parameters are all set, we can go back and copy most of the code from the first approach.

OPTIONS=`getopt --name 'mytestscript' --options 'ab:' --longoptions 'double' -- "$@"`
[ "$?" -ne 0 ] && exit
eval set -- "$OPTIONS"
 
while [ $# -gt 0 ]; do
    case "$1" in
        -a)
            OPTION_A=1
        ;;
 
        -b)
            shift
            OPTION_B=$1
        ;;
 
        --double)
            OPTION_DOUBLE=1
        ;;
 
        --)
            shift
            break
        ;;
    esac
 
    shift
done

While the symbol soup near getopt itself may look a bit terrifying, the script itself is quite nice and readable.

Why not to use getopt(1)

Unfortunately, as it often happens, many nice things have their drawbacks, and getopt(1) is no different. The main problem with said program is possible differences in behaviour between different platforms. For example, on some Unices, getopt(1) doesn't support long options... which was pretty much the only reason we considered using it over the shell builtin!

The other issue is the possibly non-POSIX-conformant behaviour. This one heavily depends on our use case; if we're writing a script for personal use, and our system exhibits the glibc behaviour, the ability to intertwine options and non-options may be comfortable. On the other hand, if we want to redistribute the script, it may cause portability issues.

Manual parsing vs. getopts

That being said, we're left with the first two approaches? Which way to go? Personally, I think that the answer is "it depends" – if you're only using short options, using getopts might be the better way, since not only you're guaranteed for the option parsing behaviour to follow a standard, but also the possible users of the script are guaranteed their parameters will be parsed in a certain way.

Should you need to support long options, or optional arguments, you're pretty much bound to write the code yourself. And that doesn't automatically make it a bad thing! Just be sure to test your code thoroughly to make sure your users won't have to spend their time wrestling your option parsing code, instead of actually enjoying the script's features.

References

Comments

Do you have some interesting thoughts to share? You can comment by sending an e-mail to blog-comments@svgames.pl.