Followers

Tuesday, November 4, 2008

How to you handle UTF-8?

Grapeshot - Developer - FAQs
Grapeshot has a very professional approach to a multitude of character sets. Grapeshot indexing routines identify the character set in use within a document and introduces appropriate stemming routines as part of tokenising the words or phrases within the incoming text. Tokenisation includes word splitting or character separation, as well as dealing with the ideosyncracies of punctuation within each language.

No comments: