Language Identifier SDK |
|
Language Identifier SDK Introduction Thank you for choosing the Lextek Language Identifier. We believe you will find that Lextek's Language Identifier is not only one of the fastest language identifiers available but also one of the most accurate. As you use this toolkit, we believe you will find that the API is both simple and straightforward. In addition, the wide variety of languages and character codings that we have available will assist you in your development efforts no matter what languages you work with.
About The Language Identification Modules There are a wide variety of language identification modules available for you to work with in your application. (There is around 260 according to the last official count. There could be more by the time you read this.) If you have a need for additional language identification modules, please contact support@lextek.com for information on how we can build a custom language identification module for your needs. This can typically be done for you as a free service. Each language identification module is named such that its character encoding is part of the name. For example, the module bulgarian_cyrillic_koi8_r tells you that the language file is for Bulgarian, it uses Cyrillic characters, and it uses the KOI-8-R character set. Many of the languages which do not have extended character set information as part of their name simply use the standard ISO character set. The lower half is composed of characters the same as ASCII with the upper half containing accented characters. In addition, this is the same as is in use by Microsoft Windows. A good percentage of the languages use Latin characters and so this is the default.
|