Parsing options in bash

2017-11-12

So, you're writing a shell script and you've come to realize that it needs to be able to perform several different functions. You could separate each of those into a separate script... but they're closely tied together; they complement each other. It only makes sense to make them available via the same command, just activated using different options.

Doing it manually

Obviously, the simplest option is employing a while loop, a case statement (or, alternatively, some ifs) and doing it by yourself.

#!/bin/bash   OPTION_A=0 OPTION_B="" OPTION_THREE=""   while [ $# -gt 0 ]; do case "$1" in -a) OPTION_A=1 ;;   -b) shift   if [ $# -gt 0 ]; then OPTION_B=$1 else echo "Option -b requires an argument" exit fi ;;   --double) OPTION_DOUBLE=1 ;;   -*) if [ "$1" == "--" ]; then shift break else echo "Unknown option '$1'" exit fi ;;   *) break ;; esac   shift done

While relatively simple, this approach has a couple of drawbacks:

Employing getopts (bash built-in)

Parsing options is a relatively common problems, and as it usually happens with common problems – there's a library for that. If you're using a POSIX-conformant shell, you can use the shell builtin getopts for option parsing.

#!/bin/bash   while getopts "ab:d" OPTNAME; do case "$OPTNAME" in a) OPTION_A=1 ;;   b) OPTION_B=$OPTARG ;;   d) OPTION_DOUBLE=1 ;;   \?) exit ;; esac done

Much cleaner! getopts will go through the script arguments, parsing options and their arguments (values). The exit status is zero while parsing options, and non-zero when a non-option is encountered, which makes it rather natural to stick the whole thing into a while loop.

Okay, but how do we actually specify the supported options? This done via the first argument to getopts, which is usually called an option string. It is basically a list of possible single-letter options, each of them optionally followed by a colon (:), which serves as a "this option expects an arguments" marker.

When getopts processes the script options, it puts the current option index in the OPTIND variable, and the option argument (if it takes one) in the OPTARG variable. The variable which stores the actual option name can be controlled by the user and is the second argument to getopts; in the example above, I use OPTNAME, as it's a descriptive name that also fits nicely with the other two variables.

Error handling with getopts

What about error handling, you may ask? getopts does that for you, too! If an unknown option is encountered, getopts will print an error message, and the selected variable (OPTNAME in the example above) will be set to ? (a question mark character).

If you want more control over error handling, you may prepend your option string with a colon (so in the example above, it would become :ab:d).

#!/bin/bash   while getopts ":ab:d" OPTNAME; do case "$OPTNAME" in a) OPTION_A=1 ;;   b) OPTION_B=$OPTARG ;;   d) OPTION_DOUBLE=1 ;;   :) echo "Option '-$OPTARG' requires an argument, ya dingus" exit ;;   \?) echo "I don't know what '-$OPTARG' is!" exit ;; esac done

When you do this, the getopts behaviour changes in a few ways.

Employing getopt (standalone binary)

One downside of getopts that was hinted by the examples above is, unfortunately, the lack of support for --long options. If we need to support these, we can use the separate getopt program.

getopt --name 'mytestscript' --options 'ab:' --longoptions 'double' -- "$@"

Hmm... This doesn't really look like option parsing, now does it? So what does getopt do, really? Basically, it performs three functions for us:

Okay, but that still doesn't answer the question: what does getopt actually DO? It's a separate program, so it can't set any variables inside our shell. As hinted above, getopt outputs a reformatted version of the argument list on stdin. For an example:

user $ ./mytestscript nonoption -b argument --double 'non option with spaces' -b 'argument' --double -- 'nonoption1' 'non option with spaces'

Where do we go from here? Well, we need to somehow put the output from getopt into our positional parameters ($1 and so on). To do this, we can use the set builtin.

There's one problem, though: getopt, by default, quotes the encountered non-options and arguments, and if we pass the output as-is to set, said quotes will make it to our parameters. We can work around this by using eval, which will cause the shell to properly process the quotes first.

OPTIONS=`getopt --name 'mytestscript' --options 'ab:' --longoptions 'double' -- "$@"` eval set -- "$OPTIONS"

Now that our positional parameters are all set, we can go back and copy most of the code from the first approach.

OPTIONS=`getopt --name 'mytestscript' --options 'ab:' --longoptions 'double' -- "$@"` [ "$?" -ne 0 ] && exit eval set -- "$OPTIONS"   while [ $# -gt 0 ]; do case "$1" in -a) OPTION_A=1 ;;   -b) shift OPTION_B=$1 ;;   --double) OPTION_DOUBLE=1 ;;   --) shift break ;; esac   shift done

While the symbol soup near getopt itself may look a bit terrifying, the script itself is quite nice and readable.

Why not to use getopt(1)

Unfortunately, as it often happens, many nice things have their drawbacks, and getopt(1) is no different. The main problem with said program is possible differences in behaviour between different platforms. For example, on some Unices, getopt(1) doesn't support long options... which was pretty much the only reason we considered using it over the shell builtin!

The other issue is the possibly non-POSIX-conformant behaviour. This one heavily depends on our use case; if we're writing a script for personal use, and our system exhibits the glibc behaviour, the ability to intertwine options and non-options may be comfortable. On the other hand, if we want to redistribute the script, it may cause portability issues.

Manual parsing vs. getopts

That being said, we're left with the first two approaches? Which way to go? Personally, I think that the answer is "it depends" – if you're only using short options, using getopts might be the better way, since not only you're guaranteed for the option parsing behaviour to follow a standard, but also the possible users of the script are guaranteed their parameters will be parsed in a certain way.

Should you need to support long options, or optional arguments, you're pretty much bound to write the code yourself. And that doesn't automatically make it a bad thing! Just be sure to test your code thoroughly to make sure your users won't have to spend their time wrestling your option parsing code, instead of actually enjoying the script's features.

References

Share with friends

e-mail Google+ Hacker News LinkedIn Reddit Tumblr VKontakte Wykop Xing

Comments

Do you have some interesting thoughts to share? You can comment by sending an e-mail to blog-comments@svgames.pl.