Forum OpenACS Development: Response to How to make an object type searchable?

Posted by Neophytos Demetriou on
I need a bit more clarification on what "datasource" is (implemented as a tcl procedure above). I can't find the specification of what datasource is supposed to provide, but judging from the code, datasource seems to extract information 1) about the content, and 2) the content itself presumably for actual indexing by the search engine.
The specification for datasource is in packages/search/sql/postgresql/search-sc-create.sql. You didn't have to know that but your description is correct -- the datasource operation provides information about the content and the content itself. The datasource operation is used both for indexing and for returning search results (constructing the summary of each result). In the future the specification of contracts, operations, etc will be available by the acs-service-contract package.
if content is some binary file (like a pdf file stored in file storage, for example), will the content still be indexable/searchable?
The search package expects one of the following:
  • content holds the filename if storage_type='file'
  • content holds the text data if storage_type='text'
  • content holds the lob_id if storage_type='lob'
The search package use these cases in order to retrieve the content and info about an object. Next, the content is filtered with respect to the given mime type. Currently, only two mime types are supported, namely text/plain and text/html [but your implementations of datasource should provide info about other mime types as well -- we will support them in the future]. Content with unsupported mime type is not indexed by the indexer. So for a pdf file we might only index its title.

As David suggests, for each mime type we require some type of handler/filter. Once the handler is available, i.e. pdf2txt, it is very easy to incorporate this into the search package (adding one line to search_content_filter). I need to do a survey of available filters/converters before we add it as part of the package but individuals can already use this functionality in their projects.

Update: (Dan wrote) For object types that don't use the CR developers can use acs_object_type__create_type, but those that do use the CR need to use content_type__create_type. content_type__create_type overloads acs_object_type__create_type and provides two views for inserting and viewing content data, and the CR depends on these views.