Language Identifier SDK

Language Identifier Home
Function List
Languages Supported
About Languages
Development SDK/API
Company
Contact Us
   
   
   
 
 
Other Products By Lextek
Onix Text Search and Retrieval Engine
Brevity Document Summarizer
Lextek Document Profiler & Categorizer
RouteX Document Routing Engine
 
   
   
   

 

Language Identifier SDK Introduction

Thank you for choosing the Lextek Language Identifier. We believe you will find that Lextek's Language Identifier is not only one of the fastest language identifiers available but also one of the most accurate. As you use this toolkit, we believe you will find that the API is both simple and straightforward. In addition, the wide variety of languages and character codings that we have available will assist you in your development efforts no matter what languages you work with.

 

About The Language Identification Modules

There are a wide variety of language identification modules available for you to work with in your application. (There is around 260 according to the last official count. There could be more by the time you read this.) If you have a need for additional language identification modules, please contact support@lextek.com for information on how we can build a custom language identification module for your needs. This can typically be done for you as a free service.

Each language identification module is named such that its character encoding is part of the name. For example, the module bulgarian_cyrillic_koi8_r tells you that the language file is for Bulgarian, it uses Cyrillic characters, and it uses the KOI-8-R character set. Many of the languages which do not have extended character set information as part of their name simply use the standard ISO character set. The lower half is composed of characters the same as ASCII with the upper half containing accented characters. In addition, this is the same as is in use by Microsoft Windows. A good percentage of the languages use Latin characters and so this is the default.

 

Function List

liCreateLanguageIdentifier

liDeleteLanguageIdentifier

liOpenLanguage

liCloseLanguage

liStartDocument

liAnalyzeDocumentText

liEndDocument