Forum OpenACS Development: search indexing robustness...
search indexing on openacs.org was broken (for several reasons
but Dave Bauer fixed most of them). It is fixed now but
the last problem was that there were some forums messages where the
forums_message row had been deleted but the acs_object had not so the
indexing failed (I deleted these and reinserted them when the
forums indexes were corrupted so it is my fault...)
That search indexing can die if there are any bugs in any datasource is probably A Bad Thing.
Anyway, I think we should make search_indexer a little more tolerant of bugs in the datasources since right now if any datasource proc errors out, indexing dies and any remaining items go unindexed.
I think we should put a catch around the switch as a big hammer solution and then put some stuff in to notify on stale content in the observer queue (and maybe nuke anything older than some threshold).
For reference here is the indexer code:
ad_proc search_indexer {} { @author Neophytos Demetriou and a million monkeys... } { set driver [ad_parameter -package_id [apm_package_id_from_key search] FtsEngineDriver] db_foreach search_observer_queue_entry {} { switch $event { INSERT { set object_type [acs_object_type $object_id] if {[acs_sc_binding_exists_p FtsContentProvider $object_type]} { array set datasource [acs_sc_call FtsContentProvider datasource [list $object_id] $object_type] search_content_get txt $datasource(content) $datasource(mime) $datasource(storage_type) acs_sc_call FtsEngineDriver index [list $datasource(object_id) $txt $datasource(title) $datasource(keywords)] $driver } # Remember seeing this object so we can avoid reindexing it later set seen($object_id) 1 } DELETE { acs_sc_call FtsEngineDriver unindex [list $object_id] $driver } UPDATE { # Don't bother reindexing if we've already inserted/updated this object in this run if {![info exists seen($object_id)]} { set object_type [acs_object_type $object_id] if {[acs_sc_binding_exists_p FtsContentProvider $object_type]} { array set datasource [acs_sc_call FtsContentProvider datasource [list $object_id] $object_type] search_content_get txt $datasource(content) $datasource(mime) $datasource(storage_type) acs_sc_call FtsEngineDriver update_index [list $datasource(object_id) $txt $datasource(title) $datasource(keywords)] $driver } # Remember seeing this object so we can avoid reindexing it later set seen($object_id) 1 } } } db_exec_plsql search_observer_dequeue_entry {} } }