The lib/stoplist.cf configuration file contains a list of types that are rejected by Essence. You can add or delete types from lib/stoplist.cf to control the candidate selection step.
To direct Essence to index only certain types, you can list the types to
index in lib/allowlist.cf. Then, supply Essence with the
The file and URL naming heuristics used by the type recognition step (described in Section 4.5.4) are particularly useful for candidate selection when gathering remote data. They allow the Gatherer to avoid retrieving files that you don't want to index (in contrast, recognizing types by locating identifying data within a file requires that the file be retrieved first). This approach can save quite a bit of network traffic, particularly when used in combination with enumerated RootNode URLs. For example, many sites provide each of their files in both a compressed and uncompressed form. By building a lib/allowlist.cf containing only the Compressed types, you can avoid retrieving the uncompressed versions of the files.