Parsing options in bash
2017-11-12So, you're writing a shell script and you've come to realize that it needs to be able to perform several different functions. You could separate each of those into a separate script... but they're closely tied together; they complement each other. It only makes sense to make them available via the same command, just activated using different options.
Doing it manually
Obviously, the simplest option is employing a while
loop, a case
statement (or, alternatively,
some ifs
) and doing it by yourself.
#!/bin/bash
OPTION_A=0
OPTION_B=""
OPTION_THREE=""
while [ $# -gt 0 ]; do
case "$1" in
-a)
OPTION_A=1
;;
-b)
shift
if [ $# -gt 0 ]; then
OPTION_B=$1
else
echo "Option -b requires an argument"
exit
fi
;;
--double)
OPTION_DOUBLE=1
;;
-*)
if [ "$1" == "--" ]; then
shift
break
else
echo "Unknown option '$1'"
exit
fi
;;
*)
break
;;
esac
shift
done
While relatively simple, this approach has a couple of drawbacks:
- If you have a lot of options, the amount of code balloons up quite quick.
- Checking for option arguments requires even more code, and there's a risk you'll make a mistake somewhere.
- Did you remember about supporting
--
as an "end of options" marker?
Employing getopts (bash built-in)
Parsing options is a relatively common problems, and as it usually happens with common problems – there's a library for that.
If you're using a POSIX-conformant shell, you can use the shell builtin getopts
for option parsing.
#!/bin/bash
while getopts "ab:d" OPTNAME; do
case "$OPTNAME" in
a)
OPTION_A=1
;;
b)
OPTION_B=$OPTARG
;;
d)
OPTION_DOUBLE=1
;;
\?)
exit
;;
esac
done
Much cleaner! getopts
will go through the script arguments, parsing options and their arguments (values).
The exit status is zero while parsing options, and non-zero when a non-option is encountered, which makes it
rather natural to stick the whole thing into a while
loop.
Okay, but how do we actually specify the supported options? This done via the first argument to getopts
, which
is usually called an option string. It is basically a list of possible single-letter options,
each of them optionally followed by a colon (:
), which serves as a "this option expects an arguments" marker.
When getopts
processes the script options, it puts the current option index in the OPTIND
variable,
and the option argument (if it takes one) in the OPTARG
variable. The variable which stores the actual option name
can be controlled by the user and is the second argument to getopts
; in the example above, I use OPTNAME
,
as it's a descriptive name that also fits nicely with the other two variables.
Error handling with getopts
What about error handling, you may ask? getopts
does that for you, too! If an unknown option is encountered,
getopts
will print an error message, and the selected variable (OPTNAME
in the example above) will be
set to ?
(a question mark character).
If you want more control over error handling, you may prepend your option string with a colon
(so in the example above, it would become :ab:d
).
#!/bin/bash
while getopts ":ab:d" OPTNAME; do
case "$OPTNAME" in
a)
OPTION_A=1
;;
b)
OPTION_B=$OPTARG
;;
d)
OPTION_DOUBLE=1
;;
:)
echo "Option '-$OPTARG' requires an argument, ya dingus"
exit
;;
\?)
echo "I don't know what '-$OPTARG' is!"
exit
;;
esac
done
When you do this, the getopts
behaviour changes in a few ways.
-
First, it won't automatically print any error messages.
-
Second, upon encountering an unknown option,
OPTNAME
will be set to?
(as in the standard scenario), and the unknown option will be put into theOPTARG
variable. - Third, when an option requiring an argument is missing said argument,
OPTNAME
will be set to:
(a colon) and said option will be put into theOPTARG
variable.
Employing getopt (standalone binary)
One downside of getopts
that was hinted by the examples above is, unfortunately, the lack of support for --long
options.
If we need to support these, we can use the separate getopt
program.
getopt --name 'mytestscript' --options 'ab:' --longoptions 'double' -- "$@"
Hmm... This doesn't really look like option parsing, now does it? So what does getopt
do, really? Basically, it performs three functions for us:
-
Error handling: like the shell builtin, it will print error messages when an error is encountered: an unknown option, or an option missing a parameter. You can also control this behaviour with the prepend-with-colon function; should you do that, errors will be silently swallowed.
-
Shuffling the arguments: while POSIX mandates that options may not follow non-options, many implementations of
getopt(3)
(the libc function) allow for mingling the two (so you can do something likechmod u+x -R directory/
). The most common example of this is glibc (the GNU C Library), commonly found on Linux.getopt(1)
inherits this behaviour. If you don't want this, you can prepend the option string with+
(a plus sign). Alternatively, you can set the environment variablePOSIXLY_CORRECT
, although this has the downside of altering the behaviour of many other programs. - Marking the end of options: the output of
getopt
will always contain a--
to tell us where the options end.
Okay, but that still doesn't answer the question: what does getopt
actually DO? It's a separate program, so it can't set any variables inside our shell.
As hinted above, getopt
outputs a reformatted version of the argument list on stdin. For an example:
user $ ./mytestscript nonoption -b argument --double 'non option with spaces'
-b 'argument' --double -- 'nonoption1' 'non option with spaces'
Where do we go from here? Well, we need to somehow put the output from getopt
into our positional parameters ($1
and so on).
To do this, we can use the set
builtin.
There's one problem, though: getopt
, by default, quotes the encountered non-options and arguments, and if we
pass the output as-is to set
, said quotes will make it to our parameters. We can work around this by using eval
, which will cause the shell
to properly process the quotes first.
OPTIONS=`getopt --name 'mytestscript' --options 'ab:' --longoptions 'double' -- "$@"`
eval set -- "$OPTIONS"
Now that our positional parameters are all set, we can go back and copy most of the code from the first approach.
OPTIONS=`getopt --name 'mytestscript' --options 'ab:' --longoptions 'double' -- "$@"`
[ "$?" -ne 0 ] && exit
eval set -- "$OPTIONS"
while [ $# -gt 0 ]; do
case "$1" in
-a)
OPTION_A=1
;;
-b)
shift
OPTION_B=$1
;;
--double)
OPTION_DOUBLE=1
;;
--)
shift
break
;;
esac
shift
done
While the symbol soup near getopt
itself may look a bit terrifying, the script itself is quite nice and readable.
Why not to use getopt(1)
Unfortunately, as it often happens, many nice things have their drawbacks, and getopt(1)
is no different.
The main problem with said program is possible differences in behaviour between different platforms. For example,
on some Unices, getopt(1)
doesn't support long options... which was pretty much the only reason we considered
using it over the shell builtin!
The other issue is the possibly non-POSIX-conformant behaviour. This one heavily depends on our use case; if we're writing a script for personal use, and our system exhibits the glibc behaviour, the ability to intertwine options and non-options may be comfortable. On the other hand, if we want to redistribute the script, it may cause portability issues.
Manual parsing vs. getopts
That being said, we're left with the first two approaches? Which way to go?
Personally, I think that the answer is "it depends" – if you're only using short options, using getopts
might
be the better way, since not only you're guaranteed for the option parsing behaviour to follow a standard,
but also the possible users of the script are guaranteed their parameters will be parsed in a certain way.
Should you need to support long options, or optional arguments, you're pretty much bound to write the code yourself. And that doesn't automatically make it a bad thing! Just be sure to test your code thoroughly to make sure your users won't have to spend their time wrestling your option parsing code, instead of actually enjoying the script's features.
Comments
Do you have some interesting thoughts to share? You can comment by sending an e-mail to blog-comments@svgames.pl.