Onix Text Retrieval Toolkit
API Reference

Function List
Topical List

Main Index

Calling Sequences
Query Processing
Relevancy Ranking
Data Types
Error Handling
Lextek Products
Onix Text Retrieval Engine
Lextek Document Profiler & Categorizer
Brevity Document Summarizer
RouteX Routing Engine
Language Identifier


With some systems the largest signle index that can be created is 2 gigabytes in size. Clearly many indexes need to be much larger than this. To get around the limits of many file systems Onix allows what are called distributed indexes. These indexes are basically a single index broken into several parts. Each part can be up to 2 gigabytes in size. In this way you can make indexes as large as you wish even under 32bit file systems.

To create a distributed index you must make two calls. One tells Onix where the basic directory for the index is located. All your sub-indexes will be found in this directory. The other call tells Onix where the index you are opening or creating is found.

Here is some partial code showing how you create and search distributed indexes. Note that some parts of the code have been deleted for space and clarity. In particular these code fragments don't include error checking. In your applications you should always check Status after making a function call with Onix.

// Create the index the way you normally would.  The
// index name will be the master index.  After you
// you have it created you can open it normally.
// The master index might be 
//    "C:\\indexes\\master.idx" 
// while the distributed parts might be
//    "C:\\indexes\\part1.idx"
//    "C:\\indexes\\part2.idx"

indexManager = ixCreateIndexManager( PASS1, PASS2, Status);

indexCreationParams = ixCreateIndexCreationParams( Status );

ixSetIndexCreationParams( indexCreationParams,
     ixSetDistributedIndex, NULL );

ixSetIndexCreationParams( indexCreationParams,
     (void*) "c:\\indexes\\master.idx" );

ixCreateIndexEx( indexManager, indexCreationParams,
     Status );

ixDeleteIndexCreationParams( indexCreationParams );

// The following is called just prior to opening the
// index but after creating an indexManager 

ixSetBaseDistributedIndexDirectory( indexManager,
    "c:\\indexes\\", Status );

ixOpenIndex( indexManager, "c:\\indexes\\master.idx",
     Status );

indexingEngine = ixStartIndexingSession( indexManager, 
     Status );

// Tell Onix where the distributed index file is
// Note the trailing path separator

ixSetFinalIndexDataFileNameAndPosition( myIndexSession,
     "c:\\indexes\\", "part1.idx", Status );

// Now we continue with the indexing as for normal

// Searching is very similar

indexManager = ixCreateIndexManager( PASS1, PASS2, Status);

// Note you pass in the *master* index and not any part
// Also note that if your search code is in a different program or 
// ixSetBaseDistributedIndexDirectory hasn't been called yet
// you *must* call it as in the above.

ixOpenIndex( indexManager, "c:\\indexes\\master.idx", Status );

ixStartRetrievalSession( indexManager, Status );

// At this stage we'd dp our searching

ixEndRetrievalSession( indexManager, Status );

ixCloseIndex( indexManager, Status );

ixDeleteIndexManager( indexManager, Status );

One thing to be careful of is not to confuse distributed indexing with distributed indexes. The similar names are unfortunate. Distributed indexing is having multiple machines or threads indexing at the same time. A distributed index splits the index into multple segments to get around the 2 gigabyte file limit that some file systems have. Thus distributed indexing is to improve speed while distributed indexes improve storage size.


See Also

ixStartIndexingSession, ixSetFinalIndexDataFileNameAndPosition, ixOpenIndex, ixSetBaseDistributedIndexDirectory