Forum OpenACS Q&A: Response to Localized searching

Collapse
Posted by Dan Wickstrom on
And the following timely post on postgresql hackers list from Oleg Barunov, one of the developers of tsearch, which constitutes the underpinnings of Openfts.
> OK, attached is an example of the problem.  Notice how trademarks and
> copyright symbols are being indexed along with the word.  This means that if
> someone searches for 'balance' in the above data set, they won't find
> anything.
>
> I'm not sure how this would be handled.  In the English language, it'd
> probably be safe to say that high ascii characters would be stripped from
> the index?  But you'd want to leave accents and stuff in I guess.  Tricky.

Rather tricky. The problem is that we don't know how to get flex to works
with locale. Parser recognizes latin words ([a-zA-Z]), nonLatin ([0-7])
and mixed words ([a-zA-Z0-7]). Your case (Balance®) is the mixed word.
The right way is to have locale aware parser to properly recognize words.  We incline to refuse a flex.