Onix Text Retrieval Toolkit
API Reference

Function List
Topical List

Main Index

Calling Sequences
Query Processing
Relevancy Ranking
Data Types
Error Handling
Lextek Products
Onix Text Retrieval Engine
Lextek Document Profiler & Categorizer
Brevity Document Summarizer
RouteX Routing Engine
Language Identifier


If you are using Onix to build an internet or intranet webcrawler and indexer, Onix provides the functionality for you to be able to parse the robots robots.txt file which is a standard way to tell webcrawlers which files and directories are permissible to crawl and index.

To read the actual specification for the robots.txt parser, the document is located at:


It gives the details of how to set up a robots.txt parser as well as the robots.txt file itself.

Onix also allows you to output a "compact" form of robots.txt using ixOutputCompactRobotsTxt(). This allows you to save a shortened copy of the robots.txt file which contains only those portions which apply to your webcrawler.

After creating the robots.txt parser using a call to ixCreateRobotsTxtParser(), you will want to set your webcrawler's name by a call to ixSetRobotName(). This is the name of your webcrawler and is used by the matcher to separate instructions given to your webcrawler in the robots.txt file from instructions given to other webcrawlers.

You can test to see if a directory or URL is eligible for crawling and indexing via the calls to ixRobotsPermissionGranted() and ixRobotsPermissionGrantedFullURL().

When you are finished using the robots.txt parser, you may delete it by a call to ixDeleteRobotsTxtParser().

See Also

ixProcessRecordID, ixRetrieveRecordID, ixFindRecordID