5.9 Collector interface description: Collection.conf


The Broker retrieves indexing information from Gatherers or other Brokers through its Collector interface. A list of collection points is specified in the admin/Collection.conf configuration file. This file contains a collection point on each line, with 4 fields. The first field is the host of the remote Gatherer or Broker, the second field is the port number on that host, the third field is the collection type, and the forth field is the query filter or -- if there is no filter.

The Broker supports various types of collections as described below:

  Type  Remote Process       Description      Compression?
    0     Gatherer    Full collection each time     No
    1     Gatherer    Incremental collections       No
    2     Gatherer    Full collection each time     Yes
    3     Gatherer    Incremental collections       Yes
    4     Broker      Full collection each time     No
    5     Broker      Incremental collections       No
    6     Broker      Collection based on a query   No
    7     Broker      Incremental based on a query  No


The query filter specification for collection types 6 and 7 contains two parts: the --QUERY keywords portion and an optional --FLAGS flags portion. The --QUERY portion is passed on to the Broker as the keywords for the query (the keywords can be any Boolean and/or structured query); the --FLAGS portion is passed on to the Broker as the indexer-specific flags to the query. The following table shows the valid indexer-specific flags for the supported indexers:

Indexer         Flag                            Description
All:            #desc                           Show Description Lines

Glimpse:        #index case insensitive         Case Insensitive
                #index case sensitive           Case sensitive
                #index error number             Allow "number" errors
                #index matchword                Matches on word boundaries
                #index maxresult number         Allow max of "number" results
                #opaque                         Show matched lines

Wais:           #index maxresult number         Allow max of "number" results
                #opaque                         Show scores and rankings

The following is an example Collection.conf, which collects information from 2 Gatherers (one compressed incrementals and the other uncompressed full transfers), and collects information from 3 Brokers (one incrementally based on a timestamp, and the others using query filters): 8500 3 -- 8500 0 --   8501 5 --   8501 6 --QUERY (URL : document) AND gnu   8501 7 --QUERY Harvest --FLAGS #index case sensitive

