How to make an object type searchable?

by Neophytos Demetriou (k2pts@cytanet.com.cy)
Making an object type searchable involves three steps:
  • Choose the object type
  • Implement FtsContentProvider
  • Add triggers

Choose the object type

In most of the cases, choosing the object type is straightforward. However, if your object type uses the content repository then you should make sure that your object type is a subclass of the "content_revision" class. You should also make sure all content is created using that subclass, rather than simply create content with the "content_revision" type.
  • Object types that don't use the CR, can be specified using acs_object_type__create_type, but those that use the CR need to use content_type__create_type. content_type__create_type overloads acs_object_type__create_type and provides two views for inserting and viewing content data, and the CR depends on these views.
  • Whenever you call content_item__new, call it with 'content_revision' as the item_subtype and 'your_content_type' as the content_type.

Implement FtsContentProvider

FtsContentProvider is comprised of two abstract operations, namely datasource and url. The specification for these operations can be found in packages/search/sql/postgresql/search-sc-create.sql. You have to implement these operations for your object type by writing concrete functions that follow the specification. For example, the implementation of datasource for the object type note, looks like this:

ad_proc notes__datasource {
    object_id
} {
    @author Neophytos Demetriou
} {
    db_0or1row notes_datasource {
        select n.note_id as object_id, 
               n.title as title, 
               n.body as content,
               'text/plain' as mime,
               '' as keywords,
               'text' as storage_type
        from notes n
        where note_id = :object_id
    } -column_array datasource

    return [array get datasource]
}
When you are done with the implementation of FtsContentProvider operations, you should let the system know of your implementation. This is accomplished by an SQL file which associates the implementation with a contract name. The implementation of FtsContentProvider for the object type note looks like:

select acs_sc_impl__new(
           'FtsContentProvider',                -- impl_contract_name
           'note',                              -- impl_name
           'notes'                              -- impl_owner_name
);
You should adapt this association to reflect your implementation. That is, change impl_name with your object type and the impl_owner_name to the package key. Next, you have to create associations between the operations of FtsContentProvider and your concrete functions. Here's how an association between an operation and a concrete function looks like:

select acs_sc_impl_alias__new(
           'FtsContentProvider',                -- impl_contract_name
           'note',                              -- impl_name
           'datasource',                        -- impl_operation_name
           'notes__datasource',                 -- impl_alias
           'TCL'                                -- impl_pl
);
Again, you have to make some changes. Change the impl_name from note to your object type and the impl_alias from notes__datasource to the name that you gave to the function that implements the operation datasource.

Add triggers

If your object type uses the content repository to store its items, then you are done. If not, an extra step is required to inform the search_observer_queue of new content items, updates or deletions. We do this by adding triggers on the table that stores the content items of your object type. Here's how that part looks like for note.

create function notes__itrg ()
returns opaque as $$
begin
    perform search_observer__enqueue(new.note_id,'INSERT');
    return new;
end;
$$ language plpgsql;

create function notes__dtrg ()
returns opaque as $$
begin
    perform search_observer__enqueue(old.note_id,'DELETE');
    return old;
end;
$$ language plpgsql;

create function notes__utrg ()
returns opaque as $$
begin
    perform search_observer__enqueue(old.note_id,'UPDATE');
    return old;
end;
$$ language plpgsql;


create trigger notes__itrg after insert on notes
for each row execute procedure notes__itrg (); 

create trigger notes__dtrg after delete on notes
for each row execute procedure notes__dtrg (); 

create trigger notes__utrg after update on notes
for each row execute procedure notes__utrg (); 

Questions & Answers

  1. Q: If content is some binary file (like a pdf file stored in file storage, for example), will the content still be indexable/searchable?

    A: For each mime type we require some type of handler. Once the handler is available, i.e. pdf2txt, it is very easy to incorporate support for that mime type into the search package. Content items with unsupported mime types will be ignored by the indexer.

  2. Q: Can the search package handle lobs and files?

    A: Yes, the search package will convert everything into text based on the content and storage_type attributes. Here is the convention to use while writing the implementation of datasource:

    • Content is a filename when storage_type='file'.
    • Content is a lob id when storage_type='lob'.
    • Content is text when storage_type='text'.