An introduction to Bash scripting
So far you’ve seen Bash as a CLI, i.e., as an interactive tool. But Bash is also a programming language, and that is where its real power lies. Nobody claims that Bash is a particularly good general-purpose programming language (though in theory it could be used that way). But when it comes to manipulating files, working in Bash is often far more convenient than working in other languages such as R or Python. An additional benefit is that Bash is almost universally available, whereas R and Python require additional installs.
In this exercise you will create a Bash script to perform two common file manipulation tasks: adding header records and renaming files.
Fork this repository: https://github.com/EDS-214/eds214-handson-cli under your account and clone your repository on workbench-1 under the course folder eds-214-repro you created this morning.
Adding a header record to each file
There are two problems with our state baby name files: they lack header records, and they end in .TXT instead of .csv. To tackle the first problem, we can say:
echo "state,gender,year,firstname,count" > tempfileThis gives us a header record in a new file tempfile. We can then append the contents of one of our data files to that file:
cat STATE.AK.TXT >> tempfileExamine tempfile using any of the following commands:
head tempfile
cat tempfile
more tempfile
less tempfileAt this point we can now rename tempfile to the desired filename ending in .csv and we’re done (with one data file anyway):
mv tempfile STATE.AK.csvQuestions
What will happen if you do this?
echo "state,gender,year,firstname,count" > tempfile cat STATE.AK.TXT > tempfileOr this?
echo "state,gender,year,firstname,count" >> tempfile cat STATE.AK.TXT > tempfileOr this?
echo "state,gender,year,firstname,count" >> tempfile cat STATE.AK.TXT >> tempfile
We want to do the above processing to every file, that’s where loops come in. But first, we need variables.
Variables
Bash supports variables, and variables are essential in writing Bash scripts. To set a variable:
name=AliceNo space between the variable name and equals sign! Variables are referenced as ${name}, or as just $name if not ambiguous.
Question
What will be printed by the following three
echocommands, and why?var=xy echo $varz echo ${var}z echo $var.z
When we process our data files in a loop, we will be working with a variable file whose value will be the name of the current file such as STATE.AK.TXT. We will want to construct a new filename such as STATE.AK.csv from that variable. This can be done by modifying a simple variable reference such as ${file} to include Bash string processing functions inside the braces. Here are two ways:
file=STATE.AK.TXT
echo ${file/.TXT/.csv}
echo ${file%.TXT}.csvThe first form performs a string substitution, substituting the first (and in our case, only) occurrence of .TXT with .csv. The second form peels off the trailing .TXT (% means “trailing”), leaving just STATE.AK. To that we then append .csv.
There are lots of string processing operations, and all are invoked using single characters like %, #, ^, etc. How the heck can you remember that? The answer is, you likely can’t. The takeaway for you in this class is that there exist string processing operators in Bash, and that if you look in the Bash manual you can get a description of what each one does.
Scripts
A Bash script is a text file containing the same Bash commands you might type interactively. It’s analogous to an R or Python script but written in Bash instead of one of those other languages.
Bash knows it is reading from a file instead of the terminal window, and it operates slightly differently:
- It doesn’t print a prompt.
 - It doesn’t read Bash configuration files (
~/.bashrc,~/.bash_profile,~/.profile, etc.). As a consequence, aliases and variables defined in those files are not visible to scripts. 
A Bash script can be run like so:
bash myscript.shExercise
Create a script
myscript.shthat processes data fileSTATE.AK.TXTas above.
Once you’ve created a script, it can be very useful to check it for errors and potential pitfalls by running it through ShellCheck.
Loops
Bash supports a few kinds of loops. The one we’ll be using here looks like this:
for var in list_of_things; do
    # operate on $var
doneAlternative syntax (notice the do is on a line by itself):
for var in list_of_things
do
    # operate on $var
doneA couple examples:
for name in Tom Dick Harry; do
    echo "Every $name"
donefor i in {99..1}; do
    echo "$i bottles of beer on the wall"
doneIt’s very common to operate on files:
for file in *.TXT; do
    # do something with $file, for example:
    echo $file
donePutting it all together
Recall that our goal is to add a header record to each data file, and to rename the data files to .csv. We can do so by writing a loop, and performing those two operations inside the loop. Write your data file-processing loop inside your myscript.sh script file.
When performing a destructive operation, it can be very helpful to view the actual commands that will be executed before doing them for real. To satisfy yourself that your script is coded correctly, prefix each command with echo so that it is simply printed in the terminal window. You will also want to either comment out I/O redirections or quote them, as they will otherwise affect the echo command. Here’s our practice loop:
for file in *.TXT; do
    echo echo "state,gender,year,firstname,count" ">" tempfile
    echo cat $file ">>" tempfile
    echo mv tempfile ${file/.TXT/.csv}
doneWhen ready, remove the echo prefixes and remove quotes around redirection operators:
for file in *.TXT; do
    echo "state,gender,year,firstname,count" > tempfile
    cat $file >> tempfile
    mv tempfile ${file/.TXT/.csv}
doneExercise
Alice does not want to rename her files to .csv, she’s fine with them being named .TXT. And after learning that cat will concatenate multiple files given on the command line, she has what she thinks is a brilliant idea for performing the processing more simply. First she puts the common header record in a file:
echo "state,gender,year,firstname,count" > headerAnd then she writes this loop:
for file in *.TXT; do
    cat header $file > $file
doneWell that turned out to be a disaster. What went wrong?
Next steps
There’s a lot more to Bash scripting. Key next topics to study include:
- Conditional statements
 - Processing command line arguments
 - Creating scripts that can be invoked like built-in commands
 
