Forum OpenACS Q&A: Re: OpenFTS 0.3.1 Problems

Collapse
Posted by Dan Wickstrom on
'if' is a stop-word in english, so it doesn't get indexed. You can also restrict indexing of other lexem types by specifying the types in the admin screen when the openfts-driver is created. This configuration is a little obscure, but I've put a script in the examples sub-directory which will list all of the lexem types supported by the current version of openfts. Running it, you wil get the following output:

403 eusdawi@edgedsp6:/home/unix/wickstrom/web/openfts/tcl/examples>./types.tcl 
  1 => Latin word
  2 => Cyrillic word
  3 => Word
  4 => Email
  5 => URL
  6 => Host
  7 => Scientific notation
  8 => VERSION
  9 => Part of hyphenated word
 10 => Cyrillic part of hyphenated word
 11 => Latin part of hyphenated word
 12 => Space symbols
 13 => HTML Tag
 14 => HTTP head
 15 => Hyphenated word
 16 => Latin hyphenated word
 17 => Cyrillic hyphenated word
 18 => URI
 19 => File or path name
 20 => Decimal notation
 21 => Signed integer
 22 => Unsigned integer
 23 => HTML Entity


In my current setup, I have it configured to not index html tags, space symbols, and HTTP head. The openfts driver also allows you to restrict what is shown in a headline display.

I don't recall the search package ever giving a warning for using stopwords for search terms.

Collapse
Posted by Simon at TCB on
Dan,

I took the inclusion of stuff like this:



      </if>
        <if @nstopwords@ eq 1>
        <font color=6f6f6f>
          "<b>@stopwords@</b>" is a very common word and was not included in your search.
          [<a href=help/basics#stopwords>details</a>]<br>
        </font>
        </if>

to mean the search package should be displaying at, but as I said I'm not convinced the 'opt' variable from which nstopwords is created is ever populated.

Maybe it never has worked, but as the code's always been there this really means its never *worked* ;o)

Anyway, so as far as we know there is a bug, its just never been addressed?

I;d also like ot suggest that as the default behaviour is to ignore such words, its probabyl better to say that in advance on the search pages than after you've stuck it in a search term?