next up previous contents index
Next: 4.6 Post-Summarizing: Rule-based tuning Up: 4.5.4 Customizing the type Previous: Customizing the presentation

Customizing the summarizing step


Essence supports two mechanisms for defining the type-specific extraction algorithms (called Summarizers) that generate content summaries: a UNIX program that takes as its only command line argument the filename of the data to summarize, and line-based regular expressions specified in lib/ See Appendix C.4 for detailed examples on how to define both types of Summarizers.

      The UNIX Summarizers are named using the convention TypeName.sum (e.g., PostScript.sum). These Summarizers output their content summary in a SOIF attribute-value list (see Appendix B). You can use the wrapit command to wrap raw output into the SOIF format (i.e., to provide byte-count delimiters on the individual attribute-value pairs).

  There is a summarizer called FullText.sum that you can use to perform full text indexing of selected file types, by simply setting up the lib/ and lib/ configuration files to recognize the desired file types as FullText (i.e., using ``FullText'' in column 1 next to the matching regular expression).

Duane Wessels
Wed Jan 31 23:46:21 PST 1996