Harvest User's Manual: Introduction to Harvest

1. Introduction to Harvest

HARVEST is an integrated set of tools to gather, extract, organize, and search information across the Internet. With modest effort users can tailor Harvest to digest information in many different formats, and offer custom search services on the Internet.

A key goal of Harvest is to provide a flexible system that can be configured in various ways to create many types of indexes.

Harvest also allows users to extract structured (attribute-value pair) information from many different information formats and build indexes that allow these attributes to be referenced during queries (e.g., searching for all documents with a certain regular expression in the title field).

An important advantage of Harvest is that it allows users to build indexes using either manually constructed templates (for maximum control over index content) or automatically extracted data constructed templates (for easy coverage of large collections), or using a hybrid of the two methods.

Harvest is designed to make it easy to distribute the search system on a pool of networked machines to handle higher load.

1.1 Copyright

The core of Harvest is licensed under GPL. Additional components distributed with Harvest are also under GPL or similar license. Glimpse, the current default fulltext indexer has a different license. Here is a clarification of Glimpse' copyright status kindly posted by Golda Velez to comp.infosystems.harvest.

1.2 Online Harvest Resources

This manual is available at harvest.sourceforge.net/harvest/doc/html/manual.html.

More information about Harvest is available at harvest.sourceforge.net.

Next Previous Contents