Parsing command-line arguments and flags

Today's post is going to be a bit more complex if you're new to shell scripting, but something I found quite beautiful is how one can go about parsing command-line arguments and flags in shell scripting. It works by using a switch-case statement and the shift expression.

Let's take a look!

# arguments.sh

# Default values of arguments
SHOULD_INITIALIZE=0
CACHE_DIRECTORY="/etc/cache"
ROOT_DIRECTORY="/etc/projects"
OTHER_ARGUMENTS=()

# Loop through arguments and process them
for arg in "$@"
do
    case $arg in
        -i|--initialize)
        SHOULD_INITIALIZE=1
        shift # Remove --initialize from processing
        ;;
        -c=*|--cache=*)
        CACHE_DIRECTORY="${arg#*=}"
        shift # Remove --cache= from processing
        ;;
        -r|--root)
        ROOT_DIRECTORY="$2"
        shift # Remove argument name from processing
        shift # Remove argument value from processing
        ;;
        *)
        OTHER_ARGUMENTS+=("$1")
        shift # Remove generic argument from processing
        ;;
    esac
done

echo "# Should initialize: $SHOULD_INITIALIZE"
echo "# Cache directory: $CACHE_DIRECTORY"
echo "# Root directory: $ROOT_DIRECTORY"
echo "# Other arguments: ${OTHER_ARGUMENTS[*]}"

Code like this is why I'm in a love-hate relationship with my terminal

Phew. That looks like a whole bunch of code. It includes all process of catching command line arguments. But let's go through everything bit by bit. First, let's start with the default values.

Defining default values

# Default values of arguments
SHOULD_INITIALIZE=0
CACHE_DIRECTORY="/etc/cache"
ROOT_DIRECTORY="/etc/projects"
OTHER_ARGUMENTS=()

You can also make the default values empty strings! Just use what makes sense to you

This is simple enough. If the user doesn't pass in a certain argument, we fill it with some default value we're happy with. Alternatively you can make the strings empty and check if these empty values are still there. In this way you can easily verify that you have all necessary arguments passed in. How you go about that is an implementation detail of your script and thus left as an exercise for the reader. I recommend tldp.org for learning about operators.

Style-wise I like defining my arguments in all-caps snake_case, because I generally treat them as constants that I do not modify. You may disagree and you're welcome to call them however you like.

Looping through arguments

for arg in "$@"
do
  .. SNIP ..
done

Funnily enough for loops end with done instead of rof. Consistency!

Looping through the arguments is equally simple. You simply loop over the magic $@ variable your shell provides to you. It contains an array of the exact command as it was called, starting after the file name.

So if you call your script using ./arguments.sh -i --cache=/var/cache --root /var/www/html/public my-project, then the array will look a bit like so

(
   $0 = ./arguments.sh
   $1 = -i
   $2 = --cache=/var/cache
   $3 = --root
   $4 = /var/www/html/public
   $5 = my-project
)

This is not the exact notation of arrays in shell, but this will be important in a second

Note that the $@ variable does not contain the value of $0. If you however access $0 normally, it will return the file name you used to call the script.

For our purposes we loop over each entry in the array and put it in a temporary $arg variable. Now we can process the arguments.

Processing all arguments

The arguments will be processed in a switch-case statement. As you may have noticed in the full code sample above, those come with their own delightful idiosyncrasies in syntax. Like a lot of other things in shell scripting, really. A case statement looks like this:

    case $arg in
        .. SNIP ..
    esac

The $arg variable in this case is the one we declared in the for-loop above

Now let's look at the various ways to process arguments and how to write switch cases.

Boolean flags

Boolean flags are those which may be there or not. A good example might be a --help flag. Parsing those looks like so

-i|--initialize)
SHOULD_INITIALIZE=1
shift # Remove --initialize from processing
;;

Note the two semicolons. Yes, you need those. Both of those.

This case statement checks whether the current value of $arg is either -i or --initialize. In our case this is true and thus we set the SHOULD_INITIALIZE variable to 1 to indicate that the flag is present. Afterwards we pop the value $arg off of our $@ array using shift. It now looks like the following:

(
   $0 = ./arguments.sh
   $1 = --cache=/var/cache
   $2 = --root
   $3 = /var/www/html/public
   $4 = my-project
)

Note that the value of $0 stayed the same while everything else shifted up by one.

Equals-separated flags

Our next case statement parses command-line flag of the form --arg=value, which is the traditional style of passing arguments. You can often see this when using Unix tools such as ls --color=auto.

-c=*|--cache=*)
CACHE_DIRECTORY="${arg#*=}"
shift # Remove --cache= from processing
;;

This is where you realize that shell scripting has magical features

In this case we check if the current $arg matches the either -c= or --cache= followed by any number of characters. If it does we take that arg variable into our string and remove the parts of it we don't need. The #*= part looks super confusing at first. What it does is remove everything character from the beginning of $arg until it finds an equals sign.

This means that --cache=/var/cache becomes /var/cache. If you want to read up more on the topic of parameter substitution in shell scripts, I recommend this article from cyberciti.biz

After this our $@ array of arguments now looks as follows:

(
   $0 = ./arguments.sh
   $1 = --root
   $2 = /var/www/html/public
   $3 = my-project
)

Space-separated flags

Our third case statement handles command-line flags of the form --arg value, which is a more modern approach. You can usually see it with command-line tools written with Node.js or Python.

-r|--root)
ROOT_DIRECTORY="$2"
shift # Remove argument name from processing
shift # Remove argument value from processing
;;

At this point these are probably a breeze to go through

Compared to the previous handler, this one is again rather easy to understand. We check whether $arg is equal to -r or root then we take the value of $2 into our ROOT_DIRECTORY variable and shift twice.

Why do we take $2? Remember: We have shifted away all previous arguments passed to the script so that now $1 is equal to the value of $arg and thus $2 now contains the arguments value.

After we shift the next two values off, we remain with this arguments array

(
   $0 = ./arguments.sh
   $1 = my-project
)

Just one more step to go and we're done

As the last step we will handle all the other arguments passed in without a flag. Let's go!

Matching other arguments

Our final case matches any value that wasn't matched by our previous handlers. These can be arguments passed without any flag, like a project name, or something else entirely.

*)
OTHER_ARGUMENTS+=("$1")
shift # Remove generic argument from processing
;;

"Pop!" goes the weasel and adds the value to an array

For this handler we simply take the value of $1 and add it to a miscellaneous array. After all the additional arguments have been added to the array, you can decide to do whatever you like. For example the first entry in the array could be a project name. Who knows!

Trying it all out

Now if you add some echo statements and try to run your script as stated above with ./arguments.sh -i --cache=/var/cache --root /var/www/html/public my-project you could see output like the following

$ ./arguments.sh -i --cache=/var/cache --root /var/www/html/public my-project

# Should initialize: 1
# Cache directory: /var/cache
# Root directory: /var/www/html/public
# Other arguments: my-project

Closing thoughts

I think that the use of such a switch-case statement together with some more advanced features of shell scripting makes for a really nice and extendable way to add command-line arguments and flags to your scripts. It also allows for great flexibility, so if you don't like being stuck with one style you can easily use the other.

Enjoy!~