next up previous contents index
Next: Running the example Up: C.4 Example 4 - Previous: Using regular expressions

Using programs to summarize a format

The second new file format is the ``Abstract'' type, which is a file that contains only the text of a paper abstract (a format that is common in technical report FTP archives). To recognize that a file is written in this format, we'll use the naming convention that the filename for ``Abstract'' files ends in ``.abs''. So, we add that type recognition customization to the lib/ file as a regular expression:

        Abstract                ^.*\.abs$

Another way to write a summarizer is to write a program or script that takes a filename as the first argument on the command line, extracts the structured information, then outputs the results as a list of SOIF attribute-value pairs.

Summarizer programs are named TypeName.sum, so we call our new summarizer Abstract.sum. Remember to place the summarizer program in a directory that is in your path so that Gatherer can run it. You'll see below that Abstract.sum is a Bourne shell script that takes the first 50 lines of the file, wraps it as the ``Abstract'' attribute, and outputs it as a SOIF attribute-value pair.

        #  Usage: Abstract.sum filename
        head -50 "$1" | wrapit "Abstract"

Duane Wessels
Wed Jan 31 23:46:21 PST 1996