next up previous contents index
Next: Customizing the summarizing Up: 4.5.4 Customizing the type Previous: Customizing the candidate

Customizing the presentation unnesting step


Some types are declared as ``nested'' types. Essence treats these differently than other types, by running a presentation unnesting algorithm or ``Exploder'' on the data rather than a Summarizer. At present Essence can handle files nested in the following formats:

  1. uuencoded
  2. tape archive (``tar'')
  3. shell archive (``shar'')
  4. compressed
  5. GNU compressed (``gzip'')
  6. binhexgif.

  To customize the presentation unnesting step you can modify the Essence source file harvest/src/gatherer/essence/unnest.c. This file lists the available presentation encodings, and also specifies the unnesting algorithm. Typically, an external program is used to unravel a file into one or more component files (e.g., gunzip, uudecode, and tar).

  An Exploder may also be used to explode a file into a stream of SOIF objects. An Exploder program takes a URL as its first command-line argument and a file containing the data to use as its second, and then generates one or more SOIF objects as output. For your convenience, the Exploder type is already defined as a nested type. To save some time, you can use this type and its corresponding Exploder.unnest program rather than modifying the Essence code.

See Appendix C.2 for a detailed example on writing an Exploder. The unnest.c file also contains further information on defining the unnesting algorithms.

Duane Wessels
Wed Jan 31 23:46:21 PST 1996