Onix Text Retrieval Toolkit
API Reference

API
Function List
Topical List

Main Index

Introduction
Calling Sequences
Query Processing
Relevancy Ranking
Data Types
Error Handling
   
Support
   
Lextek Products
Onix Text Retrieval Engine
Lextek Document Profiler & Categorizer
Brevity Document Summarizer
RouteX Routing Engine
Language Identifier

ixCreateRobotsTxtParser

Name

ixCreateRobotsTxtParser

Synopsis

RobotsTxtParserT ixCreateRobotsTxtParser(StatusCodeT *Status)

Arguments

Status: A pointer to a value of type StatusCodeT representing any error conditions.

Returns

A parser of type RobotsTxtParserT for robots.txt which also gives permissions on URLs.

If an error occurred, Status will be set to the error number.

Description

Robots.txt is a standard file which webmasters use to instruct the webcrawlers (web "search engines") which files, and directories to exclude.  A full description of the standard may be found at http://info.webcrawler.com/mak/projects/robots/norobots.html.  By providing the parser with the robots.txt file from a given site, you can then test URLs from that site against the parser to see if you have permission to download and index them.  The robots.txt file for a given site is standard.  It is sitename/robots.txt so for example, the robots.txt file for Webcrawler may be found at: http://www.webcrawler.com/robots.txt.

See Also

Robots.txt, Robots Spec
ixDeleteRobotsTxtParser, ixSetRobotName, ixParseRobotsTxt, ixRobotsPermissionGranted, ixRobotsTxtLength