Forum OpenACS Q&A: How do you tell OpenFTS to index existing ETP content?

I've just installed OpenFTS on a site that didn't have it installed previously.  From reading the ETP docs it looks like it will automagically register service contract bindings for any new content types that are created, but not for the ones I already have.  Any tips on how to tell it about existing content, before I go dig into the code?

Also, it looks like I don't want to have the acs-content-repository bindings installed, right?

TIA!

Collapse
Posted by Dave Bauer on
NOTE  - READ THE WHOLE MESSAGE. This advice changes later (Tracy Adams)

Janine,

To index previously created items a query of this sort will work:

insert into search_observer_queue (select item_id, current_timestamp, 'INSERT' from cr_items where content_type='etp_page_revision')

Restrict the results as necessary.

You shouldn't need the acs-content-repository bindings.

ETP will register service contracts upon server startup when the etp types are registered if they don't already exist. I think you just need to index the previously created items.

Thanks, Dave.

I did that and had 65 items in search_observer_queue.  Now, some time later, the table is empty but searching for common words isn't finding any hits.  Obviously some sort of sweep has run, but didn't succeed in indexing.

I don't see any relevant errors in the error log.  Is there somewhere I can look to see what went wrong?  I suspect this is an FAQ, but there are so many hits for OpenFTS that I can't find an actually helpful thread.

TIA!

Collapse
Posted by Dave Bauer on
One way is to look in the txt table. See if there are any rows for the items you indexed. text.tid will be the object_id.

Other than that, watch the log when the indexer runs. I think it runs every 30 seconds on a standard install. See if any errors occured while it was indexing the items.

Is the ETP Service Contract installed? You should have a line in you /acs-service-contract page that states:

FtsContentProvider       etp_page_revision  etp_page_revision       edit-this-page       Uninstall

If this is the case, check if the procedure to insert the ETP content into OpenFTS is working. Check if this returns a procedure description.

/api-doc/proc-view?proc=AcsSc%2eftscontentprovider%2edatasource%2eetp%5fpage%5frevision

At least this got it to work for myself.

Malte, you may be on to something here.

I have the FtsContentProvider bindings for edit-this-page, but mine are called journal_issue, journal_article, news_item and dotlrn_page.  I didn't configure this site so I'm not exactly sure where those came from;  I'm guessing that they are the currently defined content types.  I *don't* have one for etp_page_revision, nor any obvious way to create one.  I also don't have the procedure you referenced.

Hmm.... etp_page_revision is defined in edit-this-page sql/postgresql/edit-this-page-sc-create.sql, but not in the Oracle version.  This appears to be true in 5.0 as well.  Anyone know if this is intentional, or is this perhaps why search isn't working for me?

The journal_issue, journal_article and such are *not* working for ETP. No clue why, but well.

ETP runs on Oracle? Furthermore, since when does OpenFTS run on Oracle ;). So, just execute the edit-this-page-sc-create.sql and off you should go.

*sigh* I'm getting my systems mixed up - you're right, this is a Postgres box.  We don't usually use PG here at Sloan, but we are for this site.

Something must have gone wrong during the install - I'll see if I can get etp_page_revision created properly.  Thanks for the tip.

Ok, here's the (current) problem.

I'm using the code Dave posted above to add the existing content to the queue to be indexed:

insert into search_observer_queue (select item_id, current_timestamp, 'INSERT' from cr_items where content_type='etp_page_revision');

The proc that does the sweep through this table takes each item and calls [acs_object_type $object_id], where object_id is the item_id inserted above.  Unfortunately, the object type being returned is not etp_page_revision, it's content_item, and there's no binding for that.

So is there something else wrong here, or do I just need to create a proc to index things of type content_item?

Also, I don't have the proc Malte referred to above and I don't see where it would get created.  I do have etp::search::etp_page_revision, and I do have the FtsContentProvider binding for etp_page_revision, which I would have expected to create the datasource proc but it's not there, even after a restart.

Okay, I had this problem as well. I solved it by turning on the debugging of Service Contract, unbinding all Fts* Service Contracts, and then install them again. *restart*.

If you don't have the proc mentioned above, FTS won't be able to index your content, as it does not know how to get the content out of ETP (datasource...).

As for the search_observer_queue. Good question, but I assume you are fine with creating a proc to index the content_item instead. But don't ask me for details ...

OTOH, I would first try out, if you can get a new item registered and debug how this happens. Once you get newly published items to show up on the search, try the insert again, as you might have to select revision_id instead of item_id, but I might be utterly mistaken (shooting out of my gut here).

Just a followup - I got pulled off onto something more urgent, so this has been left not working for the moment.  I will post an update if/when I make some progress with this.  For the moment, anyone else with the same problem should consider this to be a roadmap to enlightenment but it will not necessarily lead them to the solution they seek. :)
Janine wrote:
The proc that does the sweep through this table takes each item and calls [acs_object_type $object_id], where object_id is the item_id inserted above. Unfortunately, the object type being returned is not etp_page_revision, it's content_item, and there's no binding for that.
I think that the stuff that goes into the search_observer_queue should be content_revisions (or subtypes of content_revision), not content_items. So, I would try the query:

insert into search_observer_queue(select revision_id, current_timestamp, 'INSERT' from cr_revisions r,cr_items i where r.item_id=i.item_id and i.content_type='etp_page_revision' and r.revision_id = i.live_revision;

Collapse
Posted by Dave Bauer on
Vinod! That is right. I am terribly sorry for giving a query that looked useful, but was totally incorrect.

I'll add this query to the OpenFTS section of the ETP documentation.

Well, I'm pretty sure it was code that you wrote somewhere that pointed me in the right direction :-)
Hi,

Ive installed OpenFTS and now trying to index previous existing ETP content.

After following the instructions in this thread, including the update of the corrrect insert, I still haven't succeed.

The search indexer schedule doesn't return errors on the first time but complain of duplicated id's when retrying.

New contents are succesfully indexed for Notes, News and File-share contents, but not for the etp and survey module.

Do I need to include any special function for etp or survey ?

Any idea of what can I do for the existing contents ?

Thanks for any help.

It's quite messy. First you need to activate the FtsContentProvider service contracts for the generic etp content, etp_page_revision. Check /acs-service-contract/ if that contract is installed and enabled. If yes then you'll see a line like this:

FtsContentProvider etp_page_revision etp_page_revision edit-this-page Uninstall

If not you have to manually run packages/edit-this-page/sql/postgresql/edit-this-page-sc-create.sql which is not run automatically upon installation for some reason. Restart aolserver. (Always restart when dealing with service contracts to be on the safe side).

Unfortunately the search_indexer doesn't complain when there is no service contract installed for an object type but it simply discards the entries. So after you installed the bindings you'll have to reinsert the content in the queue and wait for (or manually run, e.g. from the very useful acs-developer-support shell at /ds/shell/) the search_indexer.

OK,

As you said, the contract wasn't installed, so, I have installed it manually and, after restarting the server, added the content in the queue.

Now, when adding new etp_content or trying to index the existing one, the search_indexer crashes (seems I have missed to configure something):

[14/Mar/2004:20:47:54][26216.2051][-sched-] Notice: Running scheduled proc search_indexer...
[14/Mar/2004:20:47:54][26216.2051][-sched-] Error: can't read "content_Type": no such variable
can't read "content_Type": no such variable
    while executing
"ns_log notice "ETP:content_type=$content_Type""
    (procedure "etp::revision_datasource" line 5)
    invoked from within
"etp::revision_datasource $object_id"
    (procedure "AcsSc.FtsContentProvider.datasource.etp_page_revision" line 1)
    invoked from within
"AcsSc.FtsContentProvider.datasource.etp_page_revision 937"
    ("uplevel" body line 1)
    invoked from within
"uplevel $func_and_args"
    (procedure "apply" line 3)
    invoked from within
"apply $proc_name $arguments"
    (procedure "acs_sc_call" line 5)
    invoked from within
"acs_sc_call FtsContentProvider datasource [list $object_id] $object_type"
    ("INSERT" arm line 4)
    invoked from within
"switch $event {
            INSERT {
                set object_type [acs_object_type $object_id]
                if {[acs_sc_binding_exists_p FtsCont..."
    ("uplevel" body line 3)
    invoked from within
"uplevel 1 $code_block "
    ("1" arm line 1)
    invoked from within

Thanks for your fast response, it's great !

Collapse
Posted by coklat coklat on
ini untuk apa ya