Next Previous Contents

5. Broker

5.1 How can I start a Broker at boot time?

Some user contributed startup scripts are located in contrib/etc/ directory of Harvest source distribution. Modify apropriate files and copy them to your startup script directory.

5.2 How can I start a Broker without starting a collection?

When a Broker starts, it starts collecting data, which can take some time. To avoid this, use the -nocol option when invoking RunBroker.

If you have installed Harvest in /usr/local/harvest/, put following line into your startup file, e.g. /etc/rc.local:

        /usr/local/harvest/brokers/YOUR_BROKER/RunBroker -nocol

Replace /usr/local/harvest/ with the directory where you have installed Harvest.

5.3 Why don't the documents which I have gathered right now show up in the Broker?

The Broker imports data from the Gatherer once in every 24 hours. If you want to import the data immediately after gathering, just restart the Broker or signal the Broker to import data.

You can signal the broker with the command line client brkclient, located in $HARVEST_HOME/lib/broker/ by typing:

        # brkclient localhost 8501 '#ADMIN #Password secret #collection'

Replace hostname, port and password if necessary.

Other easier method is to use the WWW based admin interface at: "http://www.YOUR_SERVER.com/Harvest/brokers/YOUR_BROKER/admin/admin.html".

5.4 Why do I get error messages when I try to access "http://some.host/Harvest/brokers/your-broker-path/" after running $HARVEST_HOME/RunHarvest?

Check the error log of your http daemon. The http daemon must be able to follow symbolic links. For apache httpd you can do this by adding:

        <Location /Harvest/brokers/your-broker-path/>
                Options FollowSymLinks
        </Location>

to your httpd.conf.

If you don't want symbolic links, delete the symbolic link and copy the file to the new name.

5.5 Why are NEWS URLs broken? Where are the hostnames in NEWS URLs? How can I follow NEWS URLs?

Harvest's Gatherer doesn't put hostnames into NEWS URLs. If your web browser complains about missing news server, configure your web browser to use the news server of your provider, company or organization as your default news server.

For more information why Harvest doesn't put hostnames into NEWS URLs, see RFC-1738 chapter 3.6 and 3.7.

5.6 Why don't I get any results if I use a long or complex query string?

The length of a query string is limited to 30 characters when using regluar expressions (wildcards), excluding the escape characters.

5.7 Can I use wildcards in attribute value for structured queries?

No, regular expressions for attribute names and attribute values in structured queries aren't supported. So, queries like "Author: Smi.*" or "Auth.*: Smith" won't do what you might expect.

5.8 Are the attribute names case sensitive?

No, the attribute names are not case sensitiv. So, "Time-To-Live" is the same like "Time-to-Live", "Time-to-live", "time-to-live", etc.

5.9 Why doesn't collecting from broker work?

This is due to a bug introduced in Harvest 1.5.18. The bug was fixed in 1.7.8. To make it work again, update to 1.7.8 or higher.

5.10 How can I customize the Harvest user interface?

The query pages are located in $HARVEST_HOME/brokers/YOUR_BROKER/query-*. Most likely, you don't want to make all the variables visible to users who want to query your broker. Edit query-* and use the hidden type to set suitable defaults for variables you want to hide.

The result set presentation can be customized by choosing or modifying the configuration files located in $HARVEST_HOME/cgi-bin/lib/ directory. The configuration files Sample.cf, classic.cf, modern.cf and some LANGUAGE.cf are already installed in $HARVEST_HOME/cgi-bin/lib/ directory. You can either create a new configuration file or modify one of th configuration files to get the result set presentation you want. See the Harvest User's Manual for information about available options for the configuration file.

If you want to customize the result presentation even further, then edit $HARVEST_HOME/cgi-bin/search.cgi.

5.11 How do I localize/translate user interface?

To localize the user interface, do:

  1. Create src/broker/example/brokers/skeleton/query-glimpse-modern.html.xx.in, where xx is a two letter abbreviation for your language/country, by translating either query-glimpse-modern.html.in or other query-glimpse-modern.html.yy.in. This is the localized query page.
  2. Create components/broker/standard/WWW/language.cf by translating modern.cf or other translated configuration file like spanish.cf, german.cf, etc. This will localize the result pages and error messages.
  3. Create src/broker/example/brokers/skeleton/query-glimpse.html.xx.in by translating query-glimpse.html.in or query-glimpse.html.yy.in. This is the advanced query page.
  4. Translate src/broker/example/brokers/*.html to get localized additional help pages.

5.12 How can I replace the bundled Glimpse with an other version of Glimpse?

Edit $HARVEST_HOME/brokers/YOUR_BROKER/admin/broker.conf to let Harvest know the location of your glimpse, glimpseindex, and glimpseserver.


Next Previous Contents