Onix Text Retrieval Toolkit
API Reference

API
Function List
Topical List

Main Index

Introduction
Calling Sequences
Query Processing
Relevancy Ranking
Data Types
Error Handling
   
Support
   
Lextek Products
Onix Text Retrieval Engine
Lextek Document Profiler & Categorizer
Brevity Document Summarizer
RouteX Routing Engine
Language Identifier

ucNormalizeChar

Name

ucNormalizeChar

Synopsis

void ucNormalizeChar( UnicodeCharT *UnicodeChar, BooleanT Lower )

Arguments

UnicodeChar: A pointer to an unicode character that will be normalized.

Lower: A boolean representing whether the character should also be changed to lowercase.

Returns

Nothing.

Description

Many characters in European languages can have accents or other variations. Thus function changes such characters to a normalized form, typically to the unaccented form. It can also optionally normalize the case of the character to lowercase as well.

The normalization can be useful since often many texts are not consistent in whether they apply accents. Thus when you search for say Goedel you would also want to find Gœdel as well. By normalizing text both before indexing and before processing your query you can easily find all the typical representations. If you do use this technique in your indexing, you might also wish to normalize spellings, which this function does not do.

The character pointed to by UnicodeChar is modified. If you need to keep your original text make sure you pass a copy of it to ucNormalizeChar and not a pointer to the original text.

See Also

Unicode
ucTableNormalizeChar, ucInitializeNormalizationTable, ixUnicodeCharToHex