Harvest frequently asked questions (FAQ) with answers
1.1 What is Harvest?
1.2 Where can I get more information about Harvest?
1.3 Where can I download Harvest?
1.4 Are there any information about Harvest in Russian?
1.5 What is Harvest-ng?
1.6 What is the copyright status of Harvest?
1.7 Which Operating System do I need to run Harvest?
1.8 Does Harvest run under Windows NT/2000/XP?
1.9 What Hardware do I need to use Harvest?
1.10 Which version of Harvest should I use?
1.11 What are "harvest-modified-by-RL-Stajsic", "harvest-MathNet", and "harvest-1.5.20-kj"?
1.12 What are the limits of Harvest?
1.13 Do I need root access to install and run Harvest?
1.14 How do I block Harvest from my site? How do I identify Harvest?
1.15 What can I do to help?
2.1 How do I uninstall Harvest?
2.2 Where can I get bison and flex?
2.3 How can I install Harvest in "/my/directory/harvest" instead of "/usr/local/harvest"?
2.4 How can I avoid "syntax error before `regoff_t'" error message when compiling Harvest?
2.5 Where can I get more information for building Harvest on FreeBSD?
3.1 Does the Gatherer support cookies?
3.2 Why doesn't Local-Mapping work?
3.3 Does the Gatherer gather the Root- and LeafNode-URLs periodically?
3.4 Can Harvest gather https URLs?
3.5 When will Harvest be able to gather https URLs?
3.7 Why does the gatherer stop after gathering few pages?
3.8 How can I index local newsgroups? How can I put hostname into News URL?
3.9 What do the gatherer options "Search=Breadth" and "Search=Depth" do and which keywords are available for "Search=" option?
3.10 How can I index html pages generated by cgi scripts? How can I index URLs which has a "?" (question mark) in it?
3.11 Why is the gatherer so slow? How can I make it faster?
3.12 Why is the gatherer still so slow?
3.13 How do I request "304 Not Modified" answers from HTTP servers?
3.14 Why does Harvest gather different URLs between gatherings?
3.15 Why has the Gatherer's database vanished after gathering?
3.16 How can I avoid GDBM files growing very big during Gathering?
3.17 Can I use Htdig as Gatherer? Can the Broker import data from Htdig?
3.18 How can I control access to Gatherer's database?
3.19 Does Harvest's Gatherer support WAP/WML, Gnutella, Napster?
3.20 How do I gather ftp URLs from wu-ftp daemons?
3.21 Why doesn't file URLs in LeafNodes work as expected?
3.22 Why does gathering from a site fail completely or for parts of the site?
4.1 Why doesn't Post-Summarizing work?
4.2 How can I summarize meta tags in HTML documents?
4.3 Why are raw HTML tags in some query results?
4.4 How can I summarize DVI files?
4.5 How can I summarize Pdf files?
4.6 Where can I get pdftotext?
4.7 How can I improve summarizer for Microsoft Word files?
4.8 Where can I get wvWare?
4.9 How can I add support for new file type?
4.10 How can I use nsgmls instead of sgmls to summarize documents?
5.1 How can I start a Broker at boot time?
5.2 How can I start a Broker without starting a collection?
5.3 Why don't the documents which I have gathered right now show up in the Broker?
5.4 Why do I get error messages when I try to access "http://some.host/Harvest/brokers/your-broker-path/" after running $HARVEST_HOME/RunHarvest?
5.5 Why are NEWS URLs broken? Where are the hostnames in NEWS URLs? How can I follow NEWS URLs?
5.6 Why don't I get any results if I use a long or complex query string?
5.7 Can I use wildcards in attribute value for structured queries?
5.8 Are the attribute names case sensitive?
5.9 Why doesn't collecting from broker work?
5.10 How can I customize the Harvest user interface?
5.11 How do I localize/translate user interface?
5.12 How can I replace the bundled Glimpse with an other version of Glimpse?
6.1 What is a Gatherer?
6.2 What is Local-Mapping?
6.3 What is a Summarizer?
6.4 What is a Broker?
7.1 Who are the maintainers of Harvest?
7.2 I have found a bug. What should I do?
7.3 Is there a mailinglist for Harvest? What about a newsgroup?