Forum OpenACS Development: Re: String matching

Collapse
4: Re: String matching (response to 1)
Posted by Gustaf Neumann on
First of all, when working with tags, using regexp and/or string match is crude and not very efficient. look e.g. how the tags are implemented in the xowiki package.

Secondly, you can use the PostgreSQL full text search no only via the OpenACS tsearch2 package, but for arbitrary tables. See e.g. [1] or [2] on tutorials on tsvector and tsquery.

Thirdly, there is a long history of research, how to implement "near matches" like soundex, Levenshtein distances [3], trigrams [4], which are supported directly by PostgreSQL, and there are many more like e.g. Jaro Winkler, cosine distances, Smith-Waterman-Gotoh (sequence alignment), and many more via the pg_similary [5] extension.

PostgreSQL has a great array of capabilities in this area. It is typically much better to select data from database with such filters than to load everything into memory and to filtering on the application layer.

-g

[1] https://www.compose.com/articles/mastering-postgresql-tools-full-text-search-and-phrase-search/
[2] https://linuxhint.com/postgresql-full-text-search-tutorial/
https://www.postgresql.org/docs/11/fuzzystrmatch.html
[3] https://www.postgresql.org/docs/11/fuzzystrmatch.html
[4] https://www.postgresql.org/docs/11/pgtrgm.html
[5] https://github.com/eulerto/pg_similarity