![]()
Linux for Newbies, Part 16:
Shell Scriptingby Gene Wilburn
(The Computer Paper, Nov 2000. Copyright © Wilburn Communications Ltd. All rights reserved)
Now that we've looked at Linux shells and some of their characteristics, let's put our knowledge to work. Shell scripting, one of the bedrock skills of Unix and Linux system administration, is a large topic--you can find entire books devoted to the subject. It would be impossible to do justice to all the features of shell scripting in so short a column so what we'll do here is take a look at two real-world scripts created to assist with the administration of a Linux-based web site. Perhaps these two scripts will whet your appetite for more.
Learning to write shell scripts is a great way to explore the richness of the Linux toolkit and it's also an enjoyable way to learn programming concepts. It's satisfying to be able to create your own custom utility programs with just a few lines of code.
Script One: lcf (lowercase filenames)
The scenario behind both of the scripts in this column is that you're sysadmin'ing an Apache web server on Linux. Typically, your users are creating HTML pages on their Windows and Mac computers and transferring their completed pages to your Linux website either by FTP or via Samba. Windows and MacOS are both case-insensitive operating systems where the pages named MyPage.HTML and mypage.html refer to the same file. In Linux, of course, these are two different files. Internally the page creators should use lower case only in all their HREF's (make that a rule from day one!), but the software programs they use to create their pages often capitalize the first letters, as well as other letters, of the filenames.
When these files arrive in Linux, with uppercase characters in their names, they create havoc: the pages don't resolve. Manually renaming a lot of filenames to lower case can be annoying, particularly if you have to do it frequently. Shell scripting to the rescue! Here's a script I wrote called lcf that will lowercase any filenames in a given directory. (Note: the line numbers in these scripts have been added for reference--the actual scripts do not contain line numbers.) To make this script available to all users, place it in a common directory, such as the time-honoured /usr/local/bin directory and, as root, type "chmod a+x lcf" so everyone on the system can use it. The script is invoked as "lcf *" to check all file names in the current directory:
1: #!/bin/sh 2: # lowercase any filenames with uppercase chars 3: for file in $* 4: do 5: if [ -f $file ] 6: then 7: lcfile=`echo $file | tr [:upper:] [:lower:]` 8: if [ $file != $lcfile ] 9: then 10: mv -i $file $lcfile 11: fi 12: fi 13: doneLet's examine what this small script does. Line 1 contains the shebang path #!/bin/sh which instructs the operating system to run this program using the sh program (bash in the case of Linux). Line 2 is a comment. Comments in shell scripts are preceded with a hash mark (#). Line 3 starts a for loop that sequentially assigns the variable file each filename passed to the lcf script. The $* symbol holds a list of the filenames. The full extent of the for loop is contained between do (line 4) and done (line 13).
Line 5 begins the first if statement, ending with fi (if spelled backwards) on line 12. This forms the outer conditional. The brackets ([]) in line 5 indicate a test. What we're testing is whether or not the name we're examining is a file (-f). We only want to lowercase filenames, not directory names. If the condition is true, we go to the then statement, line 6. An explicit then is required in shell scripting.
Line 7 (lcfile=...) forms the heart of the script. We pipe the current filename in the list ($file) through the Linux tr (translate) utility, translating any uppercase characters to lowercase ([:upper:] [:lower:]). This is assigned to the shell variable lcfile using the equal sign. Note that in shell scripting there is no white space allowed around the equal sign as there is in C or Perl. Note also that the script requires backticks (`) to execute the command `echo $file | tr [:upper:] [:lower:]`. Surrounding an executable command string with backticks makes it possible to assign the results of a command to a shell variable.
The inner conditional begins on line 8 (if [ $file != $lcfile ]) and ends on line 11 (fi). This time we test to see if there is any difference between the original filename ($file) and the lowercased filename ($lcfile). Notice that shell scripts prepend the dollar sign ($) to variable names to access the contents of the variable, e.g., $lcfile to access the value of lcfile. (This can also be written, more conservatively as ${lcfile}, using braces to protect the variable from being affected by other expansion characters.) If the two variables are not equal, it means that the original filename contained some uppercase characters and we should proceed to rename the file to its lowercased counterpart. This we do in line 10 with the mv command--"mv -i $file $lcfile." The "-i" flag is a protection to keep the script from clobbering a valid file with the same name but different case in Linux. If you are certain you have no conflicting filenames, you can remove the flag and not be prompted for changes.
Script Two: rotate-apachelogs
Our second shell-scripting example is very typical of sysadmin scripts in which you want to run a sequence of steps at an appointed time every day. In this case we want to rotate our website log files on a daily basis, resolve IP addresses into names, datestamp each day's log files, move them to an archive directory, then compress them. In addition, in this scenario, we're using the excellent web stats program Webalizer, which we've already configured to work with incremental updates. It reads the file /var/log/httpd/access_log.res on a daily basis to update the stats with the previous day's hits. We run this script at midnight, briefly shutting down Apache, then restarting it a few seconds later once we've rotated the logs.
Although rotate-apachelogs is a longer script than lcf, it's a simpler script. There are no conditionals or loops--just a straight sequence of commands.
1: #!/bin/sh 2: # Rotate Apache access_log, error_log 3: # and (optionally) run webalizer stats 4: # Set today's date variable in yyyymmdd format 5: tdy=`date +%Y%m%d` 6: # Stop Apache momentarily 7: /etc/rc.d/init.d/httpd stop 8: # While Apache is stopped ... 9: # Copy access_log to access_log.tmp 10: cp /var/log/httpd/access_log \ /var/log/httpd/access_log.tmp 11: # Copy error_log to archive directory 12: cp /var/log/httpd/error_log\ /var/log/httpd/log-archive 13: # Null out access_log and error_log 14: cat /dev/null > /var/log/httpd/access_log 15: cat /dev/null > /var/log/httpd/error_log 16: # ... restart Apache 17: /etc/rc.d/init.d/httpd start 18: # Resolve IP addresses with logresolve 19: /usr/sbin/logresolve \ < /var/log/httpd/access_log.tmp \ > /var/log/httpd/access_log.res 20: # Datestamp and archive old access log 21: mv /var/log/httpd/access_log.tmp \ /var/log/httpd/log-archive/access_log.$tdy 22: # Datestamp old error log 23: mv /var/log/httpd/log-archive/error_log \ /var/log/httpd/log-archive/error_log.$tdy 24: # Compress the archived log files 25: gzip /var/log/httpd/log-archive/*.$tdy 26: # Run webalizer (values: /etc/webalizer.conf) 27: webalizerLine 1 is the shebang line, telling Linux to use sh (bash) to run this script. Lines 3-5 are comments. Line 6 uses the Linux date command to assign today's date in "yyyymmdd" format to the shell variable tdy. This variable gets used later to datestamp log files. Notice the backticks, just as we used in the previous script lcf. Line 7 (on Red Hat Linux systems) shuts down Apache so that no new entries are being logged while we're copying the log file.
Line 10 copies the current log file, access_log, to a temporary file, access_log.tmp. Line 12 puts a copy of the current error log file, error_log, into the archive directory, /var/log/httpd/log-archive. Lines 14 and 15 then re-set the current log files to null (empties them out). At this point we're ready to re-start Apache, line 17.
For efficiency, Apache is normally configured to log only IP addresses. Normal practice is to resolve the addresses into names later, in batch mode during non-peak hours. Apache provides the filter program logresolve to do the job of resolving names. We invoke this program in line 19, with access_log.tmp as input and access_log.res as output, using standard shell redirection. Webalizer has been configured to use the resolved log file (access_log.res) for statistics.
Now the majority of the work is done and we just need to finish things up. Lines 21 and 23 add the datestamp ($tdy) to the access and error logs we want to preserve and store in the archive directory. All that's left is to compress them with gzip, which we do in line 25. Our final files in the archive directory for the date of October 15, 2000, look like this:
access_log.20001015.gz error_log.20001015.gzFinally, in line 27, we run webalizer. If you use a different stats program, adjust this line accordingly. To have this script run at midnight, we add it to cron. Let's assume it resides in /root. The crontab entry is:
0 0 * * * /root/rotate-apachelogsShell scripts such as rotate-apachelogs are sometimes referred to as "cron jobs". They are the backbone of Linux system administration.
Further Study
Shell scripts can be simple, like these two examples, or very sophisticated. If scripting catches your fancy, there are some excellent resources available for further study. One of the best places to look for models is in your Red Hat Linux /etc/rc.d/init.d directory. The start/stop/restart scripts in this directory are filled with excellent working examples of conditional and branching logic. These scripts provide many instances of for loops and case statements as well as extensive use of shell variables.
Many Linux books contain chapters on shell scripting and, as mentioned previously, there are specialized books devoted to the topic. Your search needn't be limited to books specifically on Linux. Any Unix-oriented book can be useful. Some popular books on scripting include Unix Shells by Example, 2nd ed., Ellie Quigley, Prentice Hall, 1999 (ISBN 0130212229 $75); Learning the Bash Shell, 2nd ed., Cameron Newham, Bill Rosenblatt, O'Reilly, 1998 (ISBN 1565923472 $42.95); Sams Teach Yourself Shell Programming in 24 Hours, Sriranga Veeraraghavan, Sams, 1999 (ISBN 0672314819 $28.95).
Through the grace of portable GNU software, you can even study shell scripting on a Windows computer. By installing the Cygwin user tools on Windows 95/98/NT, you can invoke the Bash shell interpreter and a large set of GNU utilities, including awk, grep, find, tr, and wc. If you also add a copy of vim for Windows, to give yourself a solid vi editor, you can create a Unix-like environment in Windows that is useful, comforting and familiar. Cygwin downloads can be located at sources.redhat.com/cygwin/. To obtain vim for Windows, visit www.vim.org.
Gene Wilburn (gene@wilburn.ca) is a Toronto-based IT specialist, musician and writer who operates a small farm of Linux servers.
-30-