The Harvest Broker can handle many types of queries. The queries handled by a particular Broker depend on what index/search engine is being used inside of it (e.g., WAIS does not support some of the queries that Glimpse does). In this section we describe the full syntax. If a particular Broker does not support a certain type of query, it will return an error when the user requests that type of query.
The simplest query is a single keyword, such as:
Searching for common words (like ``computer'' or ``html'') may take a lot of time. Please be considerate of other users.
Particularly for large Brokers, it is often helpful to use more powerful queries. Harvest supports many different index/search engines, with varying capabilities. At present, our most powerful (and commonly used) search engine is Glimpse, which supports:
The different types of queries (and how to use them) are discussed below. Note that you use the same syntax regardless of what index/search engine is running in a particular Broker, but that not all engines support all of the above features. In particular, some of the Brokers use WAIS, which sometimes searches faster than Glimpse but supports only Boolean keyword queries and the ability to specify result set limits.
The different options -- case-sensitivity, approximate matching, the ability to show matched lines vs. entire matching records, and the ability to specify match count limits -- can all be specified with buttons and menus in the Broker query forms.
A structured query has the form:
tag-name : value
where tag-name is a Content Summary attribute name, and value is the search value within the attribute. If you click on a Content Summary, you will see what attributes are available for a particular Broker. A list of common attributes is shown in Appendix B.2.
Keyword searches and structured queries can be combined using Boolean operators (AND and OR) to form complex queries. Lacking parentheses, logical operation precedence is based left to right. For multiple word phrases or regular expressions, you need to enclose the string in double quotes, e.g.,
"internet resource discovery"or
Double quotes should also be used when searching for non-alphanumeric characters.