DISTRIBUTED INDEXING
With large text indexes it is often practical to have several machines working on the same index. This is the case, for instance, with kind of large indexes used to search the web. Onix enables you to do this with distributed indexing. Using distributed indexing, you can have several machines, processes, or threads all working on building different parts of the overall index. This can increase both the speed of indexing and the flexibility of the index itself.
It is important to keep in mind that Onix is capable of generating indexes extremely rapidly. Its speed typically surpasses the speed of the network connection where the data may be originating from. Before using distributed indexing as your application solution you should run some speed tests on the machines you are planning to use. You may find that Onix's indexing speed is already fast enough for your application.
A sample call sequence to performing distributed indexing is:
ixStartDistributedIndexingSession()
// Set the file name of the file
// where the new index data is going to be stored.
ixSetFinalIndexDataFileNameAndPosition()
// Now we commence with the indexing.
while(Not Done)
for(Each Word In Document) {
ixIndexWord()
}
if(More Data To Index)
ixIncrementRecord()
}
ixEndDistributedIndexingSession()
// Now copy the index to wherever it is going
// to be used if it isn't already there.
ixAppendDistributedDataToIndex()
One thing to be careful of is not to confuse distributed indexing with distributed indexes. The similar names are unfortunate. Distributed indexing is having multiple machines or threads indexing at the same time. A distributed index splits the index into multple segments to get around the 2 gigabyte file limit that some file systems have. Thus distributed indexing is to improve speed while distributed indexes improve storage size.