![]()
Linux for Newbies, pt. 20:
Document Processingby Gene Wilburn
(The Computer Paper, Mar 2001. Copyright © Wilburn Communications Ltd. All rights reserved)
Contrary to popular belief, word processors go about their business in the wrong way. They're interlopers in the land of Unix--violating four time-proven Unix principles:
- You should be able to create all source files (including word processing documents) with your preferred text editor (vi and emacs being the prevailing standards).
- All files should be portable and accessible to the Unix toolkit; that is, they should exist as ASCII text. You should have no trouble running your source files through grep, sed, awk, wc or whatever you choose.
- Structure is more important than appearance.
- Look and feel (appearance) processing should be handled by programs external to your source files.
This stands the world of WYSIWYG word processing on its head. You cannot work directly on Word or WordPerfect files due to their binary nature. You must use the program that created them to manipulate the contents--a dangerous strategy for long-term use. You eventually get releases of a product that no longer willingly load older binaries. Cross-platform support among word processors is an ongoing issue.
Another serious problem with word processors is that they are weak in several key areas: long document creation (theses, books and long reports), mathematical equations, and indexing. While support for these features is often present, it's not particularly sophisticated or robust. Word processors are also weak in the area of typesetting--the output is passable, but not up to serious typesetting standards.
So what's the pro-Unix answer to this? You guessed it: text processing, better known these days as document processing. Document processing works in much the same way that the Web does: i.e., on the Web, HTML documents are text files containing markup code. You can use any simple editor to create a web page. A web browser turns the source file into an attractive screen display. Depending on your skill as a web author, the final HTML display can be plain or sophisticated, but the tools needed to create them are dead simple.
The original document processing tools for Unix were troff and nroff--programs that took the contents of marked-up source files and formatted them for printers and typesetting machines. The troff program still exists in Linux in the form of groff, but it is now used mainly for creating man pages.
The primary document formatting engine for Linux is TeX (pronounced TECH--the X being the Greek symbol chi). TeX was developed by Donald Knuth, the godfather of algorithms, to create beautiful documents, especially documents that contained mathematical equations. He succeeded admirably and TeX has proven popular for general document creation, even for non-mathematical documents.
To make TeX easier to use, a set of coherent, simplified TeX macros called LaTeX were developed by Leslie Lamport. LaTeX has been highly popular for over a decade and may be one of the most mature, thoroughly debugged programs available on any computing platform. TeX and LaTeX are available for Linux, Unix, VMS, Windows, Macintosh, and OS/2. The version of TeX most often used in Linux is called TeTeX, a modernized version of TeX.
So, let's say you don't find this document processing concept totally retro and you're willing to try it out. If you've already created HTML documents by hand, you're well on your way. You just need to change the nature of the markup tags.
Most LaTeX instructions are preceded by a backslash ("\") and have fairly intuitive names. As with an HTML page, LaTeX documents have begin and end tags, a header, and a body, as well as additional structural elements that go beyond HTML. Here's an example of a simple LaTeX letter source file which we'll call myletter.tex:
\documentstyle{letter} \address{586 Linux Drive \\ Port Debian, ON \\ L5G 9X9} \signature{Ima Texhead, Jr.} \begin{document} \begin{letter}{Ima Texhead, Sr. \\ 486 BSD Way \\ Berkeley, CA 95587} \opening{Dear Dad,} Hope you're proud to see I'm using \LaTeX, just like you, Dad. Now that I've arrived, I could use some cash for a new Linux system. Sorry I didn't use email but I love the look of \LaTeX\ output. \closing{Thanks,} \end{letter} \end{document}Notice that the document begins with a "\documentsyle{letter}" tag. This is similar to using HTML CSS (cascading style sheets). It uses a predefined format for the letter (which you can override should you wish). The address and signature structural elements are located near the top of the document. The double backslashes ("\\") tell TeX to insert a line break between these elements (like the <br> tag in HTML). The backslashes preceding LaTeX ("\LaTeX") form a macro that typesets the word LaTeX in a special way.
There are two nested "\begin" statements followed by their corresponding "\end" statements. The first is "\begin{document}" then "\begin{letter}" (the type of document). Because this document is a letter, it supports the standard structural elements of a letter, such as "\opening" and "\closing". Paragraphing is utterly simple: simply insert a blank line between paragraphs.
We now have a source document that is ASCII text. How does this get turned into a printer output file? The next step is to run the LaTeX source file through TeX which we do at the command line by typing:
$ latex myletter.texAssuming you have LaTeX on your Linux system (it is usually installed by default), you will witness some screen activity. When it's over you will see a few files with the same base name but different extensions. The main one you're looking for is myletter.dvi.
DVI stands for "device independent". By default LaTeX always creates a device-independent file. It can be used for viewing or printing to a printer driver.
Viewing? Yup, and unlike the so-called WYSIWYG word processors, viewing a dvi file is WYSIWYRG ("what you see is what you Really get"). If you're in an X Window session, type the following to view your letter:
$ xdvi myletter.dvi &Fig. 1 Xdvi Preview:
Xdvi works somewhat like a combination web browser/PDF viewer in showing how the output looks. You can flip from page to page in a long document and you can magnify the page to see small details. Printing the file requires a dvi-aware printer driver. Here's what I type for an HP Laser Printer:
$ dvi2lj myletter.dvi ; lp myletter.ljAnother approach is to turn the LaTeX dvi file into a Postscript file (using dvips) and executing gv (Ghostview) to view the contents. LaTeX is Postscript friendly; in fact most serious typesetting work with LaTeX is done with Postscript fonts and files.
Philosophy
The philosophy behind LaTeX is that it's better to concentrate on structure than on looks. Don't worry, LaTeX will make the output look highly professional and you can tweak the looks considerably once you gain experience, but the emphasis is on getting the structure right.
This is particularly important for long documents, with chapters, section, and subsection headings, footnotes, bibliographic references, captions and illustrations. LaTeX lets you can break a report or book into component chapters or sections and tie them all together with a master document. This keeps chapter sizes manageable.
The indexing capabilities of LaTeX are particularly strong and there is a well-developed accompanying bibliographic file format called BibTeX that allows you to cite bibliographic materials in a scholarly fashion. LaTeX has no peers when it comes to displaying mathematical formulae--it's simply the best.
Inserting figures and illustrations into LaTeX is not particularly difficult. It's akin to using IMG tags in HTML. You create the illustrations in a separate package then use LaTeX statements to place them in the text.
One of the payoffs of using LaTeX is that you can use your single source document for multiple outputs. Let's say you've written a book in LaTeX. You can typeset the book by creating Postscript files to be sent to a printing house or you can print the book on a laser printer. By using a program called pdflatex you can turn your book into a PDF document. And with latex2html you can turn your book into an entire linked website, including linked index, footnotes, and table of contents elements.
LyX
Let's say you agree with all this in principle but it sounds hard and your time is short. You need "LaTeX with training wheels" in the form of LyX, an open-source document processor that has the look and feel of a GUI word processor but which outputs LaTeX files. If you know how to use any modern word processor, you can start using LyX immediately.
LyX has a lovely interface and gets you into LaTeX without having to learn a single LaTeX tag. It will also take care of your previewing and printing needs (a click of a mouse on a pull-down menu) without having to type anything at the command line. The product has an excellent built-in tutorial and user guide and, in short, removes the barrier to doing document processing the right way. LyX can be easily adapted for general office use. Lyx is available as source code or in binary format at www.lyx.org.
Fig. 2 LyX Screenshot:
Resources
If you would like to explore LaTeX and LyX beyond this simple introduction, there are several resources available to you. It's essential to become acquainted with CTAN (Comprehensive TeX Archive Network) at www.ctan.org. If you want to do anything special in LaTeX, check CTAN first. As a very mature product, LaTeX libraries and special macros abound in plentiful numbers and variations. In addition to all the normal things you might want to use for books, articles and reports, you can find LaTeX macros and stylesheets (.sty files) for creating musical scores, booklets, pamphlets, barcodes, chess diagrams, and even for typesetting crossword puzzles.
There is an excellent Internet newsgroup devoted to LaTeX discussions: comp.text.tex.
A key reference to LaTeX is, not surprisingly, LaTeX: A Document Preparation System, by Leslie Lamport, published by Addison Wesley Longman, 1994 (ISBN 0201529831 Cdn$55.50). Note that Lamport's book, although a very good introduction, is somewhat out of date. There are two excellent supplementary books by Michel Goossens, et al., both published by Addison Wesley Longman: The LaTeX Companion, 1994 (ISBN 0201541998 Cdn$56.95) and The LaTeX Graphics Companion, 1997 (ISBN 0201854694 Cdn$59.95). A search on "LaTeX" at the online sites for Chapters, Indigo and Amazon will list more than a dozen other books on the subject.So, Contrarians of the world, feel relief. You have at your fingertips one of the most sophisticated and refined document preparation systems on the planet--and it's free.
Gene Wilburn (gene@wilburn.ca) is a Toronto-based IT specialist, musician and writer who operates a small farm of Linux servers.
-30-