next up previous contents index
Next: 4.3.2 Generic Enumeration filter Up: 4.3 RootNode specifications Previous: 4.3 RootNode specifications

4.3.1 RootNode filters

Filter files use the standard UNIX regular expression syntax (as defined by the POSIX standard), not the csh ``globbing'' syntax. For example, you would use ``.*abc'' to indicate any string ending with ``abc'', not ``*abc''. A filter file has the following syntax:

        Deny  regex
        Allow regex

The URL-Filter regular expressions are matched only on the URL-path portion of each URL (the scheme, hostname and port are excluded). For example, the following URL-Filter file would allow all URLs except those containing the regular expression ``/gatherers/'':

        Deny  /gatherers/
        Allow .

Another common use of URL-filters is to prevent the Gatherer from travelling ``up'' a directory. Automatically generated HTML pages for HTTP and FTP directories often contain a link for the parent directory `` ..''. To keep the gatherer below a specific directory, use a URL-filter file such as:

        Allow  ^/my/cool/sutff/
        Deny .

Host-Filter regular expressions are matched on the ``hostname:port'' portion of each URL. Because the port is included, you cannot use ``$'' to anchor the end of a hostname. Beginning with version 1.3, IP addresses may be specified in place of hostnames. A class B address such as 128.138.0.0 would be written as ``^128\.138\..*'' in regular expression syntax. For example:

        Deny   bcn.boulder.co.us:8080
        Deny   bvsd.k12.co.us
        Allow  ^128\.138\..*
        Deny   .

The order of the Allow and Deny entries is important, since the filters are applied sequentially from first to last. So, for example, if you list ``Allow .*'' first no subsequent Deny expressions will be used, since this Allow filter will allow all entries.



Duane Wessels
Wed Jan 31 23:46:21 PST 1996