Intro to Shell Scripting

ljrk

2021-05-13

I’m regularly asked to write something about the magic of shell scripting, so here goes. While I don’t expect deep understanding from the reader, I assume basic knowledge of how to work with the terminal itself and having seen some scripts (while maybe being too scared to touch them yet).

Shell scripting is different from other scripting or programming in that we don’t have “libraries” we include. Instead, all programs we have installed serve as our huge library of tools we can invoke, chain together, loop over, etc. Thus, “learning shell scripting” consists of a) learning the tools commonly available on your regular UNIX/Linux workstation, and b) learning the language that chains together these tools.

For the language we will use the POSIX Shell subset 1 that’s virtually supported by any shell, including Bash, Zsh, but also more modern incarnations of Ksh. This isn’t only a plus due to portability, but also because POSIX Shell is much more simple than the many different ways we can build if in Bash, or iterate in Zsh. While they are definitely useful in some contexts, most often the multitude of syntaxes only confuse the user.

POSIX Shell

If & Test

Probably the most esoteric part of the classic shell is if in combination with the test program. Ksh, Bash, Zsh and so on all set out to “fix” this, however, the added complexity made things, arguably, worse. And while definitely an idiosyncratic design, it’s rather easy to understand, so let’s start:

The if built-in keyword simply executes a program and checks its exit code. If the program exited with code 0, this is considered to be a true condition. Or, as described more verbosely in the standard under The if Conditional Construct:

The if compound-list shall be executed; if its exit status is zero, the then compound-list shall be executed and the command shall complete. Otherwise, each elif compound-list shall be executed, in turn, and if its exit status is zero, the then compound-list shall be executed and the command shall complete. Otherwise, the else compound-list shall be executed.

In most environments you will have two programs called true and false available at /bin/true & /bin/false or /usr/bin/true and /usr/bin/false respectively. Let’s check what exit-code they have! You can either enter sh to get a POSIX interactive shell and type the following directly, or save it as a file, e.g., foo.sh and run it as sh ./foo.sh:

if /usr/bin/true; then
    echo 'exit code 0!'
else
    echo 'exit code non-zero!'
fi

It prints “exit code 0!” which makes sense since the executable is called “true”.

More commonly, however, we don’t want to check the exit code of a program, but check the value of a variable. We can reduce this problem to the checking of the exit code, if we’d have a program which takes an expression and exits with the appropriate exit-code. Luckily for us, someone already went through the hassle of writing this and called the program test. Let’s give it a ride:

answer=42
if test "$answer" -eq 42; then
    echo "The Answer is $answer"
fi

While working, this looks a bit clumsy, so the shorthands [ and ] were created as alternative names and delimiters to test:

answer=42
if [ "$answer" -eq 42 ]; then
    echo "The Answer is $answer"
fi

Since [ is a program with the arguments "$answer" (shell-expanded to the value of the variable), -eq, 42 and ] you need to separate all of these with spaces. The following does not work!:

answer=42
if ["$answer" -eq 42]; then
    echo "The Answer is $answer"
fi

Multiple Conditions

In order to check for the truth value of multiple conditions, we call test multiple times, chaining the results:

if test "$answer" -eq 42 && test "$earth" = "exploded"; then
    echo "BOOOM"
fi

or, with the prettier []-Syntax:

if [ "$answer" -eq 42 ] && [ "$earth" = "exploded" ]; then
    echo "BOOOM"
fi

Depending on your previous knowledge, the && may already be known to you. While it acts as a logical-and here, it’s semantics are slightly different: The command on the left-hand side of the && is executed, if it exited successfully (i.e., exit status is zero), the right-hand side is executed as well with the exit-status of the complete expression being the the latter.

But, if the first command did fail, the second command will not be executed, and failure signaled with a non-zero exit status.

Analogously, we can produce or using || which short-curcuits as well, i.e., stops after the first command exited successfully.

For completeness sake, there are also Sequential Lists using ; which simply executed the commands in order, without exiting early and simply passing the last command in the list. More advanced are Asynchronous Lists (using a single &) which run commands in the background, (possibly) in parallel but are not appropriate for usage in if, since they always exit with 0.

Else, Else-If and Empty Bodies

We can also do else blocks as well as else-if blocks, however, typing is hard, and Shell syntax even worse, which is why:

if [ "$answer" -eq 42 ];
    echo "Answer given in decimal"
else if [ "$answer" -eq 101010 ];
    echo "Answer given in binary"
fi

doesn’t work—we must type less and use the keyword elif instead:

if [ "$answer" -eq 42 ];
    echo "Answer given in decimal"
elif [ "$answer" -eq 101010 ];
    echo "Answer given in binary"
else
    :
fi

Further, if we have an empty body, we cannot just leave it empty, the shell expects something. Luckily, the : serves as a no-op.

While- & Until-Loops

The while loop works almost identical to the if construct with the slight adjustment that the command specified (i.e., most commonly test) is called multiple times:

while [ "$answer" -ne 42 ]; do
    echo "Wrong answer... increasing"
    answer="$((answer + 1))"
done

This uses Arithmetic Expansion using the $((...)) syntax. Within these we do not need $ to refer to variables and can now do pretty complex maths, directly from the console, neat!

Similar to the while loop, we can use the until loop:

until [ "$answer" -eq 42 ]; do
    echo "Wrong answer... increasing"
    answer="$((answer + 1))"
done

For-Loops

The for loop is probably the most avant-garde construct of the shell as it is a range-based for-in loop, unlike a three-expression-style as in C:

for x in foo bar baz; do
    echo "$x"
done

The syntax is quite easy to pick up and using the program seq we can also iterate over indices:

for i in $(seq 1 42); do
    echo "$i"
done

Since the for loop doesn’t expect to run a program as part of its “head” (unlike if) we need to explicitly ask the shell to do Command Substitution using the $(...) construct which runs the program with the specified arguments and replaces the expression with the output of it (not the exit-code, again, unlike if). Since seq produces a list of numbers from 1 to 42 inclusive when called like above, i will take precisely these values.

However, unlike our first example, the numbers produced aren’t delimited by spaces but by newlines! Indeed, tabs would’ve worked just as well. The shell does something called Field Splitting here, and, by default, fields are split at space, tab or newline. Again, quoting the standard:

any sequence of , , or characters at the beginning or end of the input shall be ignored and any sequence of those characters within the input shall delimit a field.

We can actually modify at what position fields are split, e.g., for parsing semicolon-delimited CSVs using IFS=';', but this is out of scope for this article :-)

A Curious Case

In some cases you want to check the value of one variable against a whole range of patterns. In many languages this can be done using a switch-case or match construction. Since shell is a language for those who don’t like to type much, only say case and terminate the construct with esac (“case” reversed).

case "$x" in
    foo) "x is foo" ;;
    bar|baz) "x is bar or baz" ;;
    *) "x ain't no hoopy frood" ;;
esac

We can match everything, using the glob character *.

Quoting

You can see that I quoted the variable x in the body of the example for loop:

for x in foo bar baz; do
    echo "$x"
done

In this case there’d have been no difference if I’d have ommitted the quotes, but it is often considered good style to use them everywhere where you can.

To demonstrate the difference, let’s replace the first value (foo) of the list we iterate over by the string “The world is ending” which contains spaces. In order to tell the loop that we consider this one item of the list (and not four), we put quotes around it:

for x in "The world is ending" bar baz; do
    printf "Found: %s\n" $x
done

I also replaced the echo with a printf to highlight the issue we will now observe: The output is:

Found: The
Found: world
Found: is
Found: ending
Found: bar
Found: baz

But… didn’t we ask the for to consider this as just one item? We did, but I also sneakily removed the quotes around $x, leading to the following chain of executed commands:

printf "Found: %s\n" The world is ending
printf "Found: %s\n" bar
printf "Found: %s\n" baz

That is, the printf is executed with five arguments, the format string (first) plus the four additional strings. However, we only used one format specifier %s and thus expected just one string following the format. This is the culprit here, as printf has a rather unexpected behavior if passed more arguments than allowed for in the format string.

The correct command execution would’ve been with quotes:

printf "Found: %s\n" "The world is ending"
printf "Found: %s\n" "bar"
printf "Found: %s\n" "baz"

Which can be achieved by quoting the $x.

Indeed, I recommend quoting all variables by default, and only thinking of it as “when must I omit the quotes” instead of the other way around.

However, there are also single-quotes which we didn’t talk about yet. All strings within double-quotes are subject to Word Expansions, that is, we could write:

echo "$answer"

To print the value of the variable answer since the shell expanded it before passing the resulting string to echo. Sometimes, we don’t want things like these to happen, and actually want to print, say, a dollar sign:

echo 'Your life is worth $0.02'

If we’d have used double quotes here, our shell would’ve been very confused. In fact, many cases, like the printf format strings above, we could (and possibly should) use single quotes to prevent errornous expansion(s).

We can also nest quotes, use escape sequences, etc., but this is again out-of-scope for this article.

Tools

We’ve now had a brief look at some of the most simple constructs of the POSIX Shell, but it is, by itself, not that powerful. We need the tools of the UNIX workbench in order to do any useful composition using the shell language.

ECHO/PRINTF

While using echo is simple, unfortunately, for all more advanced usages, the exact behavior of echo is different from platform to platform. Thus, if you do something differently than echoing a simple variable or printing simple text, use printf instead.

GREP

The tool grep had it’s origins in the line editor ed, from the editor command g/re/p, meaning, “work globally”, “match by regular expression given as re”, and “print the resulting lines”.

Spinned out as its own command-line tool, we can do just that, without learning The Standard Editor (which is the precursor to ex, precursor to vi, precursor to vim, precursor(?) to nvim). Most usage of grep boils down to learning regular expressions, which is out of scope of this article. However, I want to give some notes that many seem not to be aware of:

Whatever you do with grep, remember though that it works on lines, due to its heritage to ed.

SED

The stream-editor sed also shares a heritage with ed, basically being a simple scriptable version of it. Instead of searching for a pattern and printing the results, we can replace occurances, delete them, list them, print them, etc.

The most common usage is probably replacement, using the syntax of s/regexp/replacement/ with an optional trailing g to replace all matches globally.

TR

A specialisation to sed/grep is tr. Instead of replacing one occurance with another string, we can replace ranges with other ranges. E.g., to capitalize all the letters in a given text:

echo 'The slow red tiger jumps over the energetic cat.' | tr 'a-z' 'A-Z'

Yielding

THE SLOW RED TIGER JUMPS OVER THE ENERGETIC CAT

AWK

AWK supercharges the featureset of grep and sed by allowing us to execute arbitrary code if a certain pattern is matched. That is, the input is iterated over line by line, split into columns and you can formulate patterns as well as conditions by referring to single columns or the whole line. This is best understood in action, and, since I cannot describe this any better, I copy this verbatim from the excellent book “The AWK Programming Language” 2 by Alfred V. Aho (The Dragon Book on compiler design), Peter J. Weinberger, and Brian W. Kernighan (“The C Programming Language”):


Useful awk programs are often short, just a line or two. Suppose you have a file called emp.data that contains the name, pay rate in dollars per hour, and number of hours worked for your employees, one employee per line, like this:

Beth    4.00    0
Dan     3.75    0
Kathy   4.00    10
Mark    5.00    20
Mary    5.50    22
Susie   4.25    18

Now you want to print the name and pay (rate times hours) for everyone who worked more than zero hours. This is the kind of job that awk is mneant for, so it’s easy. Just type this command line:

awk '$3 > 0 { print $1, $2 * $3 }' emp.data

You should get this output:

Kathy 40
Mark 100
Mary 121
Susie 76.5

Let’s analyze the program, given in the single quotes: The $3 refers to the third column, and thus the pattern matches every line where the employee $1 worked more than 0 hours. In these cases, we execute the action given in the {...}, printing the name, as well as the pay.

FIND

If awk matches patterns against lines in a file, find matches patterns against files in your file system. As with awk, it can execute code, when a pattern is matched, for example printing the line count of every file with the extension .c in the current directory, or any subdirectory:

find . -name '*.c' -exec wc -l {} \;

The expression -name '*.c' matches, and the expression -exec wc -l {} ; executes the program wc with the option -l (printing lines only), while substituting the {} with eached matched file. Note that we need to escape the ; since ; is a keyword in the shell language (';' would’ve worked as well, but is one more character to type). This results in e.g., the following executions:

wc -l foo.c
wc -l src/bar.c

With the output being:

1312
161

A bit unfortunate for us, however, the wc program now prints only the line counts themselves, but we have a hard time associating them with each file.

Most command-line tools are built “intelligently” though – they change behavior, depending on whether they are called with multiple arguments or just one. If we’d run:

wc -l foo.c src/bar.c

We’d get:

foo.c:  1312
src/bar.c:  161

How to achieve that with find? Well, asking find nicely, would be a plus, so we replace the ; with a +, and behold:

find . -name '*.c' -exec wc -l {} +

Since the + is no shell keyword, we don’t need to escape it either, neat!

With this, we can build powerful meta-tools, many of my personal scripts are just wrappers around one powerful find construct. And we don’t need to sin an use the non-POSIX GNU/grep specific grep -R option, we can simply use the short:

find . -type f -exec grep {} +

Easy!


  1. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html↩︎

  2. https://9p.io/cm/cs/awkbook/index.html↩︎