Harvest and Search related Links

Home | Sites using Harvest | Download | Contributed Code | Todo List | Links
Contributors | User's Manual | FAQ | Installation | ChangeLog | NEWS

Harvest

Harvest Homepage at SourceForge
Russian Translation of Harvest User's Manual on Andrei Malashevich's Site
Russian Translation of Harvest User's Manual
Harvest Logos and Icons
Harvest-NG - Reimplementation of Harvest Gatherer in Perl
SOIF Parser written in Tcl
Harvest Page on FreshPorts (FreeBSD Ports)

Components used by Harvest

Catdoc and Xls2csv - Microsoft Word and Excel to Text Converter
dvi2tty - DVI to Text Converter
SP - SGML Utility
xlhtml - Microsoft Excel 95/97 and Powerpoint 95/97 to HTML Converter
Curl - Multiprotocol Transfer Utility
QDBM - Quick Database Manager
Indexdata's Zebra Full-Text Engine

Harvest and Search related Documents

RFC 2655: CIP Index Object Format for SOIF Objects
RFC 2731: Encoding Dublin Core Metadata in HTML
Dublin Core Metadata Initiative
Mike Taylor's Z39.50 Pages
Search Tools - Information, Guides and News

Wide Scale Deployment Efforts of Harvest

Math-Net
MathNet Harvest Page
Physnet
SozioNet
REDIRIS
SINN
TERENA

Other Search Systems

ASPSeek
Greenstone
HTDig
mnoGoSearch
Perlfect Search
PhpDig
Senga

Robots, Crawlers and Gatherers

Combine - open system for harvesting
Crawling in Perl - A Quick Tutorial
HarvestMan - A multithreaded Web-Crawler
Larbin - Multi-Purpose Web Crawler
Parallel URL Fetcher
SuckMT - Multiconnection NNTP Downloader

Full-Text Index Engines

Estraier
Managing Gigabytes
Namazu
Swish-e
Swish++
Xapian

Summarizers

Exuberant Ctags
Macromedia Flash OBJECT and EMBED tag syntax
Macromedia Flash Search Engine SDK
pdftohtml - PDF to HTML Converter
Perltidy - Indent and Reformat Utility for Perl
UUDeview - Decoding Utility for UUEncoded Data
GNU Unrtf - RTF to Text Converter
Wv (Wordview) - Microsoft Word to Text Converter
soffice2html - Staroffice/Openoffice to HTML Converter
catdvi - DVI to Text Converter
Rich Text Format (RTF) Version 1.5 Specification
Highlight - Sourcecode to HTML Converter
pstotext - Postscript and PDF to Text Converter
libExtractor - Library for Keyword Extraction for Miscellaneous File Formats
xpdf - PDF to HTML Converter

Non free Systems and Tools

Glimpse

Miscellaneous Tools

SQLite - Ebeddable SQL Database Engine
Squid Web Proxy Cache

Text Classifier

dbacl - a digramic Bayesian classifier for text recognition
Spamfilter and Text Classifier

Language and Encoding Issues

Aspell - Spell Checker
Ispell - Spell Checker
Hspell - Hebrew Spell Checker
Snowball - Multilanguage Stemmer
International Components for Unicode
Metaphone
Multilanguage Stemming Procedures and Stopwords
QuickTranslate

List compiled by Kang-Jin Lee at 1 February 2004

Home | Sites using Harvest | Download | Contributed Code | Todo List | Links
Contributors | User's Manual | FAQ | Installation | ChangeLog | NEWS