[Penguin Logo]

Linux for Newbies, pt. 24:
Filtering mail with procmail

by Gene Wilburn

(The Computer Paper, July 2001. Copyright © Wilburn Communications Ltd. All rights reserved)

Note: Due to a layout error, this column was not printed in all editions of The Computer Paper


Email is a twin-edged blade. Essential, and occasionally endearing, it permeates the fabric of modern life. But we all receive too much of it. Working through the volume of email in our inboxes drains away time that might be better spent doing other things. To recover some of this time, we need an email strategy--a way to filter email to make our inboxes more manageable.

Filtering allows us to set up rules or recipes that send mailing list postings to special folders, put high-priority correspondents into a "readmefirst" folder, and perhaps route all other mail into a "readlater" folder that can be scanned occasionally for anything important. Doing even this much can considerably reduce the amount of time it takes to manually process mail.

Proprietary email readers frequently offer built-in filtering rules--Eudora Pro for Windows and Macintosh has nice ones and even Pine has some filtering capabilities. These work well enough within a simple context, but if you switch operating systems regularly, work from different computers, or need advanced filtering, a proprietary solution may fall short of your needs.

The most effective method of filtering email is to intercept it before it is actually delivered and place your filtering rules there. Linux offers a superb mechanism for doing this--a program called procmail. Procmail, created by Stephen R. van den Berg, is the focus of this Newbies session.

But first, Fetchmail

If you're currently running Linux and using a mail reader such as Netscape Mail to fetch your mail directly from your ISP's POP server, you're using Linux like Windows. Besides being boring, it's not clever. Time to re-think your approach.

For serious mail processing power, the first thing to do is implement a specialty program to do all your POP fetching in the background, and let it forward mail to the MTA (mail transfer agent) on your Linux box. The MTA, in turn, delivers the mail to your login account. You can then use your mail reader of choice, be it Pine, Mutt, Elm, or an X-based reader.

The most popular mail fetcher is the aptly named Fetchmail program, an open-source utility created by Eric S. Raymond. It is installed by default on Red Hat systems. Fetchmail is rich in options, but can be implemented in a simple, straightforward way. It is initialized via a .fetchmailrc file in your home directory and run as a background daemon, polling your ISP accounts at regular intervals.

Because your email password must be listed in your .fetchmailrc file, fetchmail will not execute unless you chmod your permissions to 700 on this file. This protects your file from anyone else on the system (except root, of course).

A basic .fetchmailrc file that fetches mail from two different Internet providers looks like this:

Once this is in place, you launch fetchmail as a daemon by typing "fetchmail". The command "fetchmail -q" will stop the fetchmail daemon process.

Set up fetchmail and test the results with a mail reader to make sure everything is working as you expect. This preps you for the next step. (You can find out more about fetchmail by typing "man fetchmail" and by studying the material in your /usr/doc/fetchmail-[version] directory.)

Procmail

Here's an overview of how email gets delivered on a stock Red Hat Linux system, using fetchmail as the forwarder.

  1. Fetchmail retrieves mail from one or more ISP POP or IMAP servers.
  2. Fetchmail forwards mail to your MTA (mail transfer agent). In the case of standard, unmodified Red Hat, this is the sendmail program.
  3. Sendmail does the initial processing, invoking any system-wide rules you have set, then hands the mail over to your LDA (local delivery agent) program. This program, on Red Hat systems, is procmail.
  4. Procmail checks the .procmailrc file in your home directory for any special processing "recipes". If .procmailrc does not exist, mail is delivered directly to your system inbox queue--/var/spool/mail/loginname.
  5. If .procmailrc is present, procmail filters mail through procmail recipes prior to delivery.

Let's see how to set up a few procmail recipes to address some common, and not-so-common, requirements.

Procmail Recipes

Here is the basic syntax of a procmail recipe:

Translating this into a concrete example, let's say you're subscribed to the Red Hat mailing list and receive between 50-100 postings a day from this list alone. You don't want this mail intermixed with your personal email, so you set the following recipe in your .procmailrc file:

All procmail recipes begin with a colon ":" followed by a zero "0". The second colon that follows the zero is a lockfile instruction that tells procmail to use file locking--a good idea when you anticipate a lot of rapid delivery such as you might expect from an active mailing list. If you wish, you can specify a name for the lockfile.

The second line is a condition. Recipe lines that begin with an asterisk ("*") are sent to a regular expression processor. The "^TO" is a procmail regular expression macro that expands to patterns for To, Cc, Bcc, X-Envelope, Original-, Resent-, and even Apparently(-Resent)-To: conditions, which should catch all destination specifications containing a specific word. That's some macro!

The third line is an action line. By default, the action is to deliver mail to a particular mailbox file. By simply specifying the file "redhatlist", you can now pick up all your Red Hat list postings by switching to the "redhatlist" mail folder. They no longer clutter your inbox.

The next example forwards a copy of all messages from Rick about guitars to Glen and keeps a copy locally in a "guitars" folder.

This recipe demonstrates that you can have more than one condition line (conditions are ANDed together). To process more than a single action you can nest additional recipes within braces. This example uses the "c" flag in the second recipe to instruct procmail to make a copy of the message and then continue processing the next recipe after mailing a copy of the message to Glen. The final recipe instructs procmail to deliver the copy to the local "guitars" folder.

As with shell scripts, you can use variables within procmail recipes. Here is an example, from the procmailex man page, that demonstrates how to scan incoming subject lines for the word "meeting" and rotate monthly meeting folders for incoming mail:

In this example, the variable MONTHFOLDER is set to "yyyy-mm" notation, e.g., 2001-07 and the meeting correspondence will be found in 2001-07/meeting. The DUMMY variable is used to test for the existence of the MONTHFOLDER directory. If the directory does not exist (e.g. on the first instance of a new month), it is created. The "meeting" message is then slotted to the correct month's meeting folder. By extension, you could filter all your mail into appropriate monthly folders should you prefer chronological sorting of your mail.

Procmail combined with Formail

There is another useful mail utility on Linux systems called formail (mail formatter) that can be used in conjunction with procmail. For recipes that require advanced processing, calls to formail can add, delete, split, and rewrite parts of the message, including the headers.

For example, let's say you subscribe to the digest version of a mailing list on ebooks. You would like to have the digest version copied to your Hotmail account but would like a local copy of the digest to be split into individual messages for easier reply, and stored in a special folder.

Auto-reply Recipes

One excellent use of the procmail/formail combination is creating recipes for auto-replying to messages. Here is a basic auto-reply recipe:

This recipe uses the "h c" flags. The "h" feeds the mail header to a pipe while the "c" makes a copy of the entire message. There are two safety-net conditions. The first condition--"!^FROM_DAEMON"--is a regular expression macro that eliminates most instances of mail from a mailing list. You will become highly unpopular if you send automated replies back to a listserve.

The second condition--!X-Loop--is a check that you're not responding to mail from yourself, causing a mail loop. How does it detect mail from yourself? By appending a customized mail header field, e.g. -A"X-Loop: me@myisp.com", to each piece of mail it replies to. If it encounters the heading it created, it doesn't reply again.

The following, more complex, auto-reply recipe is equivalent to the Unix vacation program. It tells folks you're away but it only sends them the message once to each correspondent. You can achieve this with the following example (taken directly from the procmail man pages):

If you use two Linux systems in different locations, say one at work and one at home, you can cross-deliver your email to both systems by installing an appropriate variant of the following recipe on each system:

One last example. Let's say you have a select number of correspondents that you always want to read first. In this example you put a uniquely identifiable portion of their email names, separated by vertical bars, into a file called .hiprio and you send all their messages to the folder "readmefirst":

By pointing your mail reader to the "readmefirst" folder as your default mail view, you effectively eliminate everything else from view until you have time to deal with it.

References

If this has whet your appetite for more, here's the good news. All the basic documentation for creating recipes and using fetchmail, procmail, and formail is already on your Linux system. The relevant man pages are fetchmail, procmail, procmailrc, procmailex, and formail. Just type "man procmailex", for instance, to look at examples of procmail recipes.

After looking at the man pages, a good starting point is the excellent Internet procmail tutorial maintained by Nancy McGough at www.faqs.org/faqs/mail/filtering-faq/. Another highly useful resource is the procmail tips site maintained by Finnish professor Timo Salmi at www.uwasa.fi/~ts/info/proctips.html. Both offer extensive links to other procmail sites.

One of the most effective ways to use procmail for the whole family is to set up a Linux server as the primary mail server for everyone. Put fetchmail and procmail scripts into each account and enable IMAP on the local server. Family members can then access the procmail-processed files and folders from their personal PCs with their mail clients of choice, be it Eudora Pro, Outlook Express, Netscape Mail, or any mailer that supports IMAP.

Gene Wilburn (gene@wilburn.ca) is a Toronto-based IT specialist, musician and writer who operates a small farm of Linux servers.

-30-