logo Autopackage - Easy Linux Software Installation

A quick intro to bash

As bash is a language that not many people are familiar with, I have put together this short guide to give you some idea of how to read and write it. The best way of learning any language of course is to use it, and I therefore suggest that you get the autopackage source code and take a look. It's not complex and is fairly well commented.

Why bash? Basically there is no other language that integrates quite as well with the filing system as bash does, and as package management is largely about filing system manipulation it make bash ideally suited to it. Bash is also not as primitive as people often think. It's even possible to build a simple object system in it!

A bash program is made up of commands, that can be optionally joined together in various ways. This is where the power and flexibility of bash comes from - the fact that most Linux installations come with a large selection of small tools that can be combined to make something greater than the sum of its parts

Quite simply the best bash resource available is the Advanced Bash Scripting Guide, which covers everything available in the language, as well as some undocumented features. If you do any work in bash at all, you need this guide open in the background.

Piping and substitution

These are two of the more important concepts in bash. Let's take the following line, which comes from the makepackage script:

stub=`echo "$stub" | sed "s/skipLines=X/skipLines=${stubLength}/"`

The first section is fairly straightforward - we're assigning something to the variable stub. Note that when you assign to a variable it's just "varname=whatever". It must follow that syntax exactly. Common errors at this point include

The next part isn't so clear. Different types of quotes have different meanings in bash. Here, we're using the `` syntax (top left of your keyboard if it's a standard type). This means run the command inside the quotes, and then use the output as a string. So we're assigning the output of the command(s) to the variable stub. Inside the quotes, the first command we have is echo. This just writes its parameters to stdout, and it's often used to display output to the user. However, it can also write files, help join commands together and control screen colour. Here, we are passing a string to echo which contains only $stub. When bash encounters a $ sign, it attempts to substitute the value of the variable. So in this case, the echo command will be passed whatever is in the stub variable - we have "" marks here to ensure it's passed as 1 parameter instead of several. This isn't always strictly necessary, but it's a good habit to get into, for reasons you shall discover as you use bash.

Next up is the | pipe operator. This just sends the output of the command on the left to the input of the command on the right, which in this case is sed. The pipe operator is very useful, and you can construct quite long pipes. By using the tee command, you can also split pipes into multiple directions (useful for debugging), although this isn't used much in autopackage.

GNU sed is a stream editor. This means it performs basic transformations on textual input. It's most often used (at least in autopackage) for search and replace, which is indeed what's happening in this example. To do this, it takes a regular expression of the form s/xxx/yyy/ where xxx is what to replace, and yyy is what to replace it with. Sed is actually a programming langauge of its own, and you can find more information in the man pages and info system.

Finally you'll notice we have another type of string substitution of the form ${varname}. This is useful when you won't be terminating the variable name with whitespace.

So to wrap up, this line of bash simply performs a search and replace upon the string in stub. In fact, there is an easier way of performing search and replace in bash, but this was a good way of introducing piping, substitution and sed. Here is how to do string search and replace the easy way:

a="one two three"
b=${a/two/too}
echo $b

This produces "one too three". You can find out more about this syntax in the bash guide and info pages.

Strings

Bash is not typed, however you can perform quite complex string and numerical manipulations upon variables if you so wish. You can do string manipulations easily enough:

Let's take a more in depth look at these examples. The first should be easy enough if you understood the section on string substitution. We're just joining the two strings with a space. If you didn't want the space, you have to write c="${a}${b}".

The second is more complex. We have a string, "ball parc playerz" - the name of a couple of DJs I like at the moment by the way, and we want to split it. String splitting is a fairly common operation. Although bash has some (basic) support for string splitting built in, this is a good opportunity to introduce awk. Awk is the predecessor to Perl, and is actually a complete programming language in and of itself. However, awk is very fast at compiling and running programs due to its simplicity. So fast in fact,that often awk programs are only 1 line long, and are "disposable", in that you write them straight onto the command line to do a job, then forget them. Awk is a pattern matching based language. It's possible to write a complete assembler in awk (check out the info pages), but that is just perverse. Awk is better at matching against strings using regular expressions and then performing actions than writing "real" sequential programs.

The structure of an awk program is very simple. It operates on records and fields. Don't be put off by the database terminology, in fact you can redefine what a record or field is to suit you. By default one record is one line, and fields are separated by whitespace. Each awk "rule" looks like this:

PATTERN { statement; statement; statement..... }

PATTERN is a regular expression, except in a few special cases, in which you can use keywords such as BEGIN to specify actions that should occur at the start of the program. The pattern either matches a record, or doesn't match. If it matches, the statements are run in a normal script fashion. If it doesn't match, the rule is ignored. If you don't have a clue what I'm talking about by now, you should probably read up on regular expressions. There are lots of good tutorials about this most versatile and useful tool, and every good programmer should know regex syntax at least partially. From now on, I'll assume you know regular expressions, as they are pretty key.

Awk was originally designed to operate on tables of information, layed out using lines and spaces/tabs, hence the defaults. However, often to do something useful, you want to alter what awk considers to be a record and a field. In the example above, the first thing we do (in the BEGIN section) is set the FS variable (field separator). This is itself a regular expression - the fields are the text in between the matches. Here, we just set it to a word

Once a pattern is matched against a record, the field splitting process takes place, and the results are stored in the field registers. These are accessed by writing $1, $2, $3 and so on. $0 is a special case, and refers to the record without any splitting. The print statement is one of the most useful awk commands. It takes an arbitrary number of parameters, and then prints each one in turn. In this example, we print the first field, then a newline, then the second field. The end result is that we split the string into two halves, and output each half onto new lines. Why is this useful? Well, because we can then use arrays to split each line into an array element. We've then succeeded in splitting a string into array elements. Here's how we do that:

a="ball parc playerz" # variable assignment
b=$( echo $a | awk 'BEGIN{ FS="parc" } { print $1 "\n" $2 }' ) # command substitution
c=($b) # create an array from b
# automatic line splitting takes place
# c is now an array containing "ball" and "playerz"

echo ${c[1]} # produces "playerz"

Note that an awk program is always surrounded by '' marks. This is the "literal string" delimiter and switches off the various preprocessors bash has. For instance, $1 will stay as $1, rather than replaced with something else. This is known as "protecting" the program from the shell.Finally, we get to comparing strings. What we have here is in fact a generic comparison operation, but in shorthand. It could also be written like this:

if [[ "abc" = "abc" ]]; then
   echo "Success"
fi

which is the bash syntax for the "if" statement. However, it's easier, and quicker, to write [[ "abc" = "abc ]] && echo "Success" because the && operator will pass control to its right, if the statement to the left returned with an exit code of zero. Exit codes are something we haven't covered yet, but basically they indicate success or failure. 0 means success, anything else means failure. The [[ ]] syntax is built into bash, and means "sensible test". Bash has several ways of testing things, and most of them look extremely wierd to somebody used to languages such as C++ and Java. The [[ ]] operators reduce this headache somewhat.

Numerics

Numeric operations in bash are dealt with using the (( )) syntax. This basically means, if you want to work with numbers in the way you'd expect, you need to surround the operation with double brackets. The reason is that as bash in untyped, variables can be both numbers and strings. How they are treated depends on the the context.

For example:

i=1
echo $i # produces "1"
(( i++ ))
echo $i # produces "2"
(( i = i + 125 ))
echo $i # produces "127"
i=${i/2/9} # search and replace
echo $i # produces "197"
echo $(( i - 90 )) # produces "107"

As you can see, numbers and strings are completely interchangeable. There's more info about what you can do with the double brackets syntax in the bash documentation

Exit codes

Most operations in bash are commands, and commands return exit codes. Commands are often external programs such as awk, sed, echo and so on. Sometimes commands are bash builtins, for instance the [[ ]] test construct, the predefined statements like if, while etc and functions you have defined. All report their status with an exit code. The exit code of the last command is always stored in the $? variable. This can be used for error checking, amongst other things, but also flow control as you saw above. A few constructs you may see:

..... and one mistake to avoid is this:

someFunc();
echo "DEBUG: we're at stage 2";
if [[ "$?" = "0" ]]; then doSomethingGood(); fi

Hopefully you can see why. Echo is a command just like any other and so here $? will always return 0. If you must use this sort of syntax, store the value of $? immediately after the command to be tested