Forum OpenACS Development: search indexing robustness...

Collapse
Posted by Jeff Davis on
search indexing on openacs.org was broken (for several reasons but Dave Bauer fixed most of them). It is fixed now but the last problem was that there were some forums messages where the forums_message row had been deleted but the acs_object had not so the indexing failed (I deleted these and reinserted them when the forums indexes were corrupted so it is my fault...)

That search indexing can die if there are any bugs in any datasource is probably A Bad Thing.

Anyway, I think we should make search_indexer a little more tolerant of bugs in the datasources since right now if any datasource proc errors out, indexing dies and any remaining items go unindexed.

I think we should put a catch around the switch as a big hammer solution and then put some stuff in to notify on stale content in the observer queue (and maybe nuke anything older than some threshold).

For reference here is the indexer code:

ad_proc search_indexer {} {
    @author Neophytos Demetriou and a million monkeys...
} {

    set driver [ad_parameter -package_id [apm_package_id_from_key search] FtsEngineDriver]

    db_foreach search_observer_queue_entry {} {

        switch $event {
            INSERT {
                set object_type [acs_object_type $object_id]
                if {[acs_sc_binding_exists_p FtsContentProvider $object_type]} {
                    array set datasource [acs_sc_call FtsContentProvider datasource [list $object_id] $object_type]
                    search_content_get txt $datasource(content) $datasource(mime) $datasource(storage_type)
                    acs_sc_call FtsEngineDriver index [list $datasource(object_id) $txt $datasource(title) $datasource(keywords)] $driver
                }
                # Remember seeing this object so we can avoid reindexing it later
                set seen($object_id) 1
            }
            DELETE {
                acs_sc_call FtsEngineDriver unindex [list $object_id] $driver
            }
            UPDATE {
                # Don't bother reindexing if we've already inserted/updated this object in this run
                if {![info exists seen($object_id)]} {
                    set object_type [acs_object_type $object_id]
                    if {[acs_sc_binding_exists_p FtsContentProvider $object_type]} {
                        array set datasource [acs_sc_call FtsContentProvider datasource [list $object_id] $object_type]
                        search_content_get txt $datasource(content) $datasource(mime) $datasource(storage_type)
                        acs_sc_call FtsEngineDriver update_index [list $datasource(object_id) $txt $datasource(title) $datasource(keywords)] $driver
                    }
                    # Remember seeing this object so we can avoid reindexing it later
                    set seen($object_id) 1
                }
            }
        }

        db_exec_plsql search_observer_dequeue_entry {}

    }

}