Forum OpenACS Development: Updating Search integration with the Content Repository

I am testing some changes to the content repository implementation of
the Search service contracts.

To allow applications to specify if an item should be indexed or not,
I have added a field to cr_revisions called searchable_p. This field
is set to default to TRUE. By setting the default to TRUE,
applications that assume all content will be indexed will work exactly
the same as before.

If an application needs to disallow indexing on certain items, it can
set the searchable_p field to FALSE.

For example. In edit-this-page, we had decided that only
live_revisions would be indexed and searchable. So I am writing code
to update the searchable_p field when a live_revisions is added or
changed.

I could build this feature into the CR, but I think it is better if
each package that uses the CR can decide for itself how to implement
the indexing of items.

Packages that require search that do not use the content repository
would have to add this functionality in their own way.

I think that letting packages that are build on the CR modify the
searchable_p field if they need to is a better solution than assuming
in the CR triggers that only cr_revisions set as live_revision can be
indexed.

Comments or questions on this plan are welcome!

I uploaded a patch to implement this. If anyone has any better ideas, please let me know.

https://openacs.org/sdm/one-baf.tcl?baf_id=1578

I imagine that in most cases when a user performs a search it makes sense to only search in live revisions.

E.g. when searching a web site's content that is maintained with edit-this-page, as you pointed out above ... this applies for most types of applications I guess. Even more so when there is no UI for the non-admin user to see the old revision - then it makes absolutely no sense to return a search result for that.

But then there are special cases when it might be useful to be able to search in all revisions, for example an editor that wants to search in old content.

Which makes me think that instead of deciding once per package which content gets indexed maybe rather all versions should be indexed and the code that actually performs the search should contain an option that decides if non-live revisions should be included in the search, defaulting to false.

(Just from the users point of view, I don't know if that would be feasible to implement.)

Tilmann,

This is an excellent point. There are different classes of search user. So the most powerful search would index all content, and filter it out when displaying the results.

I am not sure how this would affect the performance of the search summary pages. Possibly this feature can be added to the search content provider service contract, so that a user_id or party_id such as all_users, registered_users, etc is passed to the content provider to decide if an item should be displayed to a particular user.

This is good. I am glad we are discussing this subject. I find the potential of powerful full text indexing of all the content in the database one of the must useful features of OpenACS.