Onix Text Retrieval Toolkit
API Reference

API
Function List
Topical List

Main Index

Introduction
Calling Sequences
Query Processing
Relevancy Ranking
Data Types
Error Handling
   
Support
   
Lextek Products
Onix Text Retrieval Engine
Lextek Document Profiler & Categorizer
Brevity Document Summarizer
RouteX Routing Engine
Language Identifier

DISTRIBUTED INDEXING

With large text indexes it is often practical to have several machines working on the same index. This is the case, for instance, with kind of large indexes used to search the web. Onix enables you to do this with distributed indexing. Using distributed indexing, you can have several machines, processes, or threads all working on building different parts of the overall index. This can increase both the speed of indexing and the flexibility of the index itself.

It is important to keep in mind that Onix is capable of generating indexes extremely rapidly. Its speed typically surpasses the speed of the network connection where the data may be originating from. Before using distributed indexing as your application solution you should run some speed tests on the machines you are planning to use. You may find that Onix's indexing speed is already fast enough for your application.

A sample call sequence to performing distributed indexing is:

ixStartDistributedIndexingSession()

// Set the file name of the file 

// where the new index data is going to be stored.

ixSetFinalIndexDataFileNameAndPosition()

// Now we commence with the indexing.

while(Not Done)
for(Each Word In Document) {
 ixIndexWord()
}

if(More Data To Index) 

   ixIncrementRecord()
}

ixEndDistributedIndexingSession()

// Now copy the index to wherever it is going

// to be used if it isn't already there.

ixAppendDistributedDataToIndex()


One thing to be careful of is not to confuse distributed indexing with distributed indexes. The similar names are unfortunate. Distributed indexing is having multiple machines or threads indexing at the same time. A distributed index splits the index into multple segments to get around the 2 gigabyte file limit that some file systems have. Thus distributed indexing is to improve speed while distributed indexes improve storage size.


										

See Also

ixStartDistributedIndexingSession, ixEndDistributedIndexingSession, ixAppendDistributedDataToIndex, ixStartIndexingSession