[Linux Inside]

Linux Inside: Utilities, the heart and soul of Linux

by Gene Wilburn

(The Computer Paper, May 2000. Copyright © Wilburn Communications Ltd. All rights reserved)


From a certain point of view, Linux can be seen primarily as a set of utilities bonded together by a common kernel. These utilities form the "toolkit" model that has endeared Unix to programmers, system administrators, and advanced users for over three decades. It is frequently the utilities that attract new users to Unix in the first place. The tools are so rich in potential that entire books, such as Unix Power Tools (O'Reilly & Associates, ISBN 1-56592-260-3, $85.95), have been written about them.

Many of the key utilities that come with Linux were developed by the GNU Software Project, headed by Richard M. Stallman. Linux is so heavily dependent on and interwoven with GNU utilities that one Linux distribution, Debian, calls itself Debian GNU/Linux. The GNU utilities include everything from a directory lister to a C++ compiler. GNU utilities, most of which are based on older Unix utilities but rewritten from scratch, tend to be feature rich. They frequently improve on the originals.

Linux utilities come in two basic flavors: open source and commercial. Some commercial products, such as RealPlayer, are available for free, while others, such as the BRU tape backup software, are available for a fee. The majority of Linux utilities are free, open-source packages.

Traditional Unix utilities evolved in the command-line environment. Because of that they can be joined together to form powerful shell scripts and cron jobs. Some of the newer, graphical utilities are designed to be clicked and pointed at in the X Window System. In this month's column we'll take a brief look at some of the more popular Linux utilities.

The Toolkit Mindset

When the entire operating system is largely a collection of tools, it's difficult to find a starting place. Just about everything you do on a Linux system, from listing files (ls) to changing your password (passwd) is a utility. The results of one utility can be piped into the next in a chain to create a new, customized utility. To illustrate a simple example of this, let's assume you'd like to know how many files are in your /bin directory. To find out, you can simply pipe the output of ls to the word count utility, wc, using the "lines" flag ("-l"):

It's a piece of cake to take something like this and create a new utility shell script. Here's one I created to monitor the activity of a Linux mailhub. It checks the sendmail processing queue to see how many email messages have been deferred due to delivery problems. I called the script "mqueue" and placed it in /usr/local/bin for system-wide access. It's a one-liner:

At any time you can type mqueue on this system to find out how many potential "problem" messages are in the queue. Taking the concept one step farther, I created a shell script called "mailalert.sh" that checks the number of messages in the queue against a threshold number and sends the system administrator(s) an email that there might be something amiss. If the number crosses a certain threshold, it could be the result of a dropped link to the Internet or, worse, a spam-relay attempt by an outside party. Although the queued messages could also be the result of normal, heavy processing with a high incidence of undeliverable messages, it's something you, as sysadmin, should know about:

Note the backticks or left-apostrophes, `, in the script. These are essential for issuing commands. A resulting mailalert message looks like this:

(As an aside, you should always alias postmaster to a real user account or two on a production system so that a sysadmin regularly receives mail about anything out of the ordinary with the system's mail agents.) The preceding shell script combines four basic utilities--ls, wc, date, mail--into one logical sequence. The next step is to put this into a scheduler to be run at specified times during the day. This brings us to "cron", the traditional Unix scheduling daemon. Once you've created and tested a new utility script, you can schedule it to be run at various times of day or night by scheduling it with the crontab utility. You invoke crontab as crontab -e for editing. A typical crontab entry looks like:

This "cron job" entry can be translated as "run the mailalert shell script located in /usr/local/bin every four hours, on the hour, every day of the week, every day of the month, every month". For a full explanation of crontab entries, type man 5 crontab.

Nearly every Linux utility is accompanied by a man (manual) page for quick reference. To see all the possible options you can use with any given utility, consult the man pages, e.g., man ls or man wc. Your progress with Linux will bear a direct correlation to the amount of time you spend reading man pages.

This simple example--having the system monitor the status of the sendmail processing queue for deferred messages and sending alerts when appropriate--gives you a glimpse of the Unix toolkit mindset. When the tools are at hand, and all the glue needed to join them together is also at hand, you have unlimited, open-ended opportunities to create anything you need. Linux is the ultimate tinkerer's dream.

Finding things in files

Not everything has to be scripted, of course. You run many utilities from the command line to do one-off tasks and queries. Examples of such utilities would include the grep family: grep, fgrep, and egrep. The greps are used to find pattern matches in files.

For example, you may know that some of the HTML pages on your Web site contain references to "Linus Torvalds" but you can't remember all the relevant pages. If the pages are in a single directory, you can type:

This one-liner does a case-insensitive (-i) search on "linus" through all the .html files in the directory /home/httpd/html/opensource. The -l flag requests grep to print a list of all the filenames that contain matches. Without the -l flag, grep would display every matching line in every matching file, which can also be highly useful.

Want to email this list to your webmaster? You can do it all in one from the command line:

Grep is frequently used to search for patterns in log files. It provides a quick way to find that needle in the haystack. For example, one of your users may complain that a message he or she sent to a colleague in Arizona was never sent (implying, of course, that your server is at fault). You can grep the system mail logs under the recipient's email address to see if the outside system accepted the mail or not. If the reply "User unknown" is in the logs as part of the status of the message, it's a sure bet that your own user has mistyped the address. Convincing the user that this might be the case is a little easier when you can provide the evidence.

Although the grep default is to display the line that contains the match, it is often useful to have some context for that line, such as the line preceding and the line following the matching line. GNU grep, which Linux systems offer, provides contextual grep by either using the -C context flag, which provides two lines before and after, or by specifying the number of context lines you want. Here's an example of looking through the mail log on a Debian GNU/Linux system with one line of context before and after the matching line:

Contextual grep is particularly useful when inspecting documents or source files.

Quite often it's desirable to grep right down through an entire directory tree. The traditional way to do this is by combining grep with another utility called find. Find is a powerful, complex tool that can traverse directory trees.

Out of curiosity I used find and grep to query the number of references there were to Linus Torvalds in the /usr/doc directory of a Red Hat 5.2 system:

The state of your system

Two Linux utilities you will find invaluable are df and du. The df utility summarizes free disk space. By default it lists sizes in blocks, which can vary from system to system, e.g.:

This display is the default on most Unix systems. GNU df, however, offers the excellent -h flag, meaning "human-readable", that produces a more graspable result:

The du (disk usage) utility offers more granularity. It can traverse a directory tree or subtree to determine how much space is being used. It can take the measure of an individual user or the extent of your corporate website.

The du default provides a directory by directory summary, with a grand total summary at the end. It, too, can be invoked with the -h flag for improved readability. On my Debian GNU/Linux portable, for instance (an elderly IBM ThinkPad), the /usr/doc directory occupies about 20Mb of disk space. Using another useful utility, tail, here are the last ten lines of its readout, including the summary:

With the summary flag, -s, I can get a snapshot of my personal disk usage:

It's easy to see that, with less than a megabyte, this is a new setup. It'll begin to fill up in no time.

Another good state-of-the-system utility is free. Free gives you a snapshot of memory usage, including virtual memory (swap space):

Another essential system utility is top. Top shows you the top processes running on your system. It can be highly useful for identifying memory hogs or a runaway process. When you type top you see a display similar to this:

Backup and compression utilities

One of the most ubiquitous Unix utilities is tar, short for tape archiving utility. Tar is used to create tar files and tar backup tapes. On a file level, tar works in a manner somewhat similar to the Windows ZIP programs, except that, by default, tar does not compress files. The most widely used compression utility is gzip (GNU zip), which is unrelated to DOS ZIP.

By default tar recurses subdirectories, making it a good utility to use when moving large sets of directory structures with subdirectories and files from one place to another. If you have a SCSI tape drive attached to your Linux system, you can tar your whole system to tape with the following command:

Restoring everything from the same tape would require:

There is usually a bit more to backup than this, however. There are certain directories you should exclude (/dev and /proc for instance). The tar man pages will help you with the finer details.

Tar is also used extensively as a way to transfer Unix and Linux programs. When you go to a site to get source files, you generally get a compressed tar file, which you reconstitute to a directory on your local system. These files are frequently referred to as "tarballs."

A tar file is often compressed with GNU zip and ends in the extension .tgz or .tar.gz. You can uncompress the file, then extract the tar files, or do it all in one:

or simply

If you obtain a file with the extension .Z, as in myarchive.tar.Z, it has usually been compressed with an older Unix utility called compress. To uncompress it, you type:

Other compressors include the more aggressive bzip2, which makes smaller files, and zip and unzip, utilities that can create or open DOS Zip files.

Tar is by no means the only way to backup your Linux system. Most veteran system administrators prefer dump and restore over tar for whole system backups. Others use a more complex program called cpio, which includes better error detection. Dump, restore and cpio are usually installed by default on Linux systems.

For backing up several servers across your network to a single tape device, many administrators turn to amanda, an open-source product from MIT. Amanda is available for all Linux distributions.

Backup is one area where commercial utilities have made some headway into the Linux world. Two of the popular backup programs are BRU (Backup and Restore Utility) from Enhanced Software Technologies (www.bru.com) and Knox Software's Arkeia backup software (www.arkeia.com). Specially licensed home versions of these are sometimes included in Linux distributions or are downloadable for personal use. Both products offer robust error checking, tape management, GUI interface, and multi-server backup across the network.

Other noteworthy utilities

Unfortunately there are so many excellent utilities available for Linux that this survey barely scratches the surface. Is Perl a utility or a programming language? Perl scripts are easy to incorporate inside shell scripts. Two of the classic and powerful Unix utilities, awk and sed, are studies unto themselves. Awk can parse fields out of text files and sed, the Unix stream editor, can be used in search and replace scripts on text files such as HTML pages.

There are several programs available for synchronizing your system's time with public time standards. Of these, rdate is the simplest to use.

Want to see a file in hexadecimal? Use hexdump or the octal dump utility od with the -h hex flag. Need version control over a set of files? Visit rcs, revision control system, for a very sophisticated utility that is free on Linux (and costs upwards of $500 for Windows systems).

Need to compare the differences between two files? Meet diff. Want to filter your incoming email into various mailboxes and filter out email from certain parties? Study procmail. Need to set up an email vacation notice while you're holidaying in Fiji? Use vacation.

Everyone needs security. Replace telnet with ssh. Need to download files from the net but don't like all the banner ads? Lynx, the champion character-based web browser, to the rescue.

If you're exchanging floppies with the DOS world, you'll love the mtools. Programs like mcopy, mdir, mdel make exchanging data on floppies very easy.

And then there's probably my personal favorite utility: screen. Screen allows me to have several virtual terminal sessions on a single connect. Perfect for the command-line junkie.

We haven't even touched on sound utilities, CD rippers, and graphical wonders such as The GIMP. It's all there. You can spend the better part of a lifetime just discovering all the useful utilities for Unix/Linux. One way to keep up with new offerings is to subscribe to the comp.os.linux.announce newsgroup.

And don't forget more and less, the essential Linux page viewers. Well, that's more or less all the space we have for grok'ing Linux utilities (to the best of my knowledge, Linux does not have a utility named grok). Happy grep'ing.

Gene Wilburn (gene@wilburn.ca) is a Toronto-based IT specialist, musician and writer who operates a small farm of Linux servers.

-30-