Forum OpenACS Development: Response to Full Text Search

Collapse
Posted by Neophytos Demetriou on
OpenFTS stores each document as an array of integers (lexeme IDs). The index access structure of the array of integers is constructed as an RD-Tree which is implemented using the GiST interface that is available in PostgreSQL

Additionally, OpenFTS maintains 10 (this is configurable) indexing tables where it stores the frequency and the position of each of the lexemes in a document. The frequency of each lexeme is calculated as:

         occurences of the lexeme in the document
freq = ----------------------------------------------
            number of all lexemes in the document
The position is counted in number of lexemes from the beginning of the document. The position field can have negative values that weight more in ranking of the results.

OpenFTS supports stopwords and stemming. The default method for stemming uses Porter's algorithm.