And the following timely post on postgresql hackers list from Oleg Barunov, one of the developers of tsearch, which constitutes the underpinnings of Openfts.
> OK, attached is an example of the problem. Notice how trademarks and
> copyright symbols are being indexed along with the word. This means that if
> someone searches for 'balance' in the above data set, they won't find
> anything.
>
> I'm not sure how this would be handled. In the English language, it'd
> probably be safe to say that high ascii characters would be stripped from
> the index? But you'd want to leave accents and stuff in I guess. Tricky.
Rather tricky. The problem is that we don't know how to get flex to works
with locale. Parser recognizes latin words ([a-zA-Z]), nonLatin ([0-7])
and mixed words ([a-zA-Z0-7]). Your case (Balance®) is the mixed word.
The right way is to have locale aware parser to properly recognize words. We incline to refuse a flex.