Language Identifier SDK

Language Identifier Home
Function List
Languages Supported
About Languages
Development SDK/API
Company
Contact Us
   
   
   
 
 
Other Products By Lextek
Onix Text Search and Retrieval Engine
Brevity Document Summarizer
Lextek Document Profiler & Categorizer
RouteX Document Routing Engine
 
   
   
   

liEndDocument

Name

liEndDocument -- Finish analyzing a document and return results.

Synopsis

void liEndDocument(LextekLanguageIdentiferT LanguageIdentifer, LanguageIDListT* IDList, StatusCodeT *Status)

Arguments

LanguageIdentifer: A Language Identifier object that was allocated by liCreateLanguageIdentifier.

IDList: A pointer to an LanguageIDListT structure. The results of the analysis of your document is placed in this structure.

Status: A pointer to a StatusCodeT object. (A signed long integer.)

Returns

Nothing.

Description

liEndDocument ends the analysis of your document and returns the results of the analysis.

Before a call to liEndDocument, you will want to have used the fuction liAnalyzeDocumentText to analyze your document's text and begin determining the character set. You will also want to ensure that enough text has been analyzed by liAnalyzeDocumentText to ensure that the language identifier has enough information to work with. We typically recommend 200 characters or more of text though you can sometimes get by with less. The more text the more accurate the analysis. However, amounts of text over about 15,000 characters will have little impact on the analysis.

The results of the analysis of your document will be returned in the structure IDList (of type LanguageIDListT) which has the following format:

struct LanguageIdentificationT {
char LanguageIDString[80];
int LanguageIDNum;
float Weight;
};

struct LanguageIDListT {
BooleanT LanguageFound;
LanguageIdentificationT Language[4];
int LanguageIDCount;
};

In the structure, LanguageIDCount tells you how many of the Language structures have been filled out. The Language structure list is sorted according to decending likelyhood of a match to your language. (i.e., Language[0] is the most likely match to the language your document is written in where Language[3] is the least likely.) The flag LanguageFound tells you if the language identifier believes the closest match also matches the language of your document's text.

So, to look at the results, you will want to check the LanguageIDCount and the LanguageFound flag and see how many close matches there were and if the closest match is also the language of your document. After which, you can walk the Language array and extract the LanguageIDStrings, LanguageIDNums, and Weights as you feel appropriate.

One point of note, the LanguageIDString returned for each matching language is filled out as a zero terminated (or "C" style) string.

See Also

liStartDocument, liAnalyzeDocumentText