Onix Text Retrieval Toolkit
API Reference

API
Function List
Topical List

Main Index

Introduction
Calling Sequences
Query Processing
Relevancy Ranking
Data Types
Error Handling
   
Support
   
Lextek Products
Onix Text Retrieval Engine
Lextek Document Profiler & Categorizer
Brevity Document Summarizer
RouteX Routing Engine
Language Identifier

ixGetTempDiskSpaceUsage

Name

ixGetTempDiskSpaceUsage

Synopsis

void ixGetTempDiskSpaceUsage(OnixIndexingEngineT IndexingEngine, ULongT *DiskSpaceUsage, StatusCodeT *Status)

Arguments

IndexingEngine: An indexing engine which was returned by a call to ixStartIndexingSession()

DiskSpaceUsage: A pointer to an unsigned long integer where the current temporary disk space usage will be returned.

Status: A pointer to a value of type StatusCodeT representing any error conditions.

Returns

Nothing.

If an error occurred, Status will be set to the error number.

Description

Onix builds a series of temporary files during the indexing process which store intermediate index information.  These files can be fairly large depending on whether the text being indexed is being indexed in Record Mode, Word Mode, or IDF Mode.  Also whether you are storing data inside the index, or the nature of the text itself.  On average you can expect these files to take up from approximately equal the size of the original data for a Record Level index to a little over 2 times the size of the original data for a Word Level index.

If you are generating a WordLevel index and want to cut down on the size of the temporary index files or the index size you may want to consider not indexing common words which do not contain any information a user is likely to search for.  (These are called "Stop Words.")  These words include words such as "the", "is", "and", "or", "there", "was", etc.  This can reduce your temporary index and final index space usage considerably.

When building your index, you want to watch the size of your temporary files to ensure that they do not grow to be larger than 1/2 the available disk space or greater than 2GB in size.  Onix needs disk space equal to the size of the temporary files to do the final processing of the index including index compression.  The reason for the 2GB limit is due primarily to limitations in the C/C++ languages which limit the size of files accessed via the stdio routines.  (And also a limitation inherent to many file systems.)  While you do not need to call ixGetTempDiskSpaceUsage after indexing every word, it probably is a good idea to check your temporary disk space usage every few megabytes or so.

Furthermore, please keep in mind that some operating systems (most notably Windows) do not update the amount of disk space available or used until a file is closed.  Thus, asking the operating system during an index build how much disk space is available is likely to result in misinformation.  To get around this problem, find out how much disk space is available from your OS before beginning to index your information.

ixGetTempDiskSpaceUsage should not be used after calling ixEndIndexingSession.

See Also

Temporary Files
ixIndexWord, ixStartIndexingSession, ixSetLocationForTemporaryFiles, ixGetLocationForTemporaryFiles