Forum OpenACS Q&A: How do I build search capabilities into a new application?

I have been searching the bboards and have not been able to figure out what the right way to build modular search for both postgres and oracle for a new application. I am trying to add search capabilites to the new forums package. Note that it does not use the content repository as a back end, it stores the content itself.

Any ideas?

For Postgres, I would vote for integration with the openfts search facility.  I haven't personally enabled a package to be search-capable, but it looks pretty easy.

I would look at the notes package for an example of how to make forums available to search.  It is a simple package meant to demonstrate openfts search.

Basically, all that is needed is to create the appropriate FtsContentProvider service contracts and make two tcl procs: forums__datasource and forums__url.

I have looked at the search package but it does not provide oracle support. The site-wide-search package does not provide postgres support. I figured these were old packages and I probably should be using something else.

Yon, there's nothing old about the search package -- which is also the package that initiated the service contracts that you use extensively in the dotlrn framework. The search package supports oracle as long as someone takes the time to write an implementation of the FtsEngineDriver contract for oracle intermedia.
AFAICT the search package has not been ported to oracle, and i am talking about the package itself, not about implementing the FtsEngineDriver. porting the package itself should be really easy and i could do that quickly. i would require some help writing an FtsEngineDriver implementation for intermedia. is there a document on how to do this somewhere, or an example?
Missed your point about oracle support. Yes, you're right, porting the package to oracle is very easy (check with davb though, details follow). At some point, I think Dave started working on an intermedia implementation but I don't know how much he progressed. I don't know if anyone else tried this.
Neophytos,

Does the Search package provide a way to search only in particular package, such as just searching forums?

I started working on that but I suspended it when we were suppose to have our first release (Feb. 2002). I hope the code lies somewhere in my computer but I'll have check. Among other things I was going to implement object-type-specific, mime-specific, and package-specific searches. I'll check and see how far I got that time.

This also reminds that I'll have to implement versioning and xml-support for contracts (so that we can say that a package provides an implementation for version X of that contract). If we are a bit patient on this one, I may have it ready before the new openacs release.

One more question, if you don't mind.

Is there work being done to get the tsearch version of openfts ported to tcl?

If I wanted to work on an FtsEngineDriver for interMedia, is there any overview information on the driver ? Where would I get that info?

ok, so i have started porting the search package to oracle. i think i will be done in about an hour or so. hopefully by then someone will have posted on how to write the FtsDriver implementation for interMedia, or better yet emailed me the code :) (wishful thinking)

Benjamin, there's already a tcl port for the openfts version that makes use of tsearch (only available from cvs at http://openfts.sourceforge.net/). I've just started working for the documentation of the new version and as soon as it's finished we're gonna release. Now, if you're asking whether there's an openacs package that makes use of the new version, not yet, but it won't take long to get that ready as well. I'm just asking for patience since I'm between a lot of things lately and I'm also catching up from several months of inactivity.

Daryl, I don't know if this is what you're looking for (you need to sign in):

* (Oracle Intermedia) http://otn.oracle.com/docs/products/intermedia/content.html

* (Oracle Text) http://download-east.oracle.com/otndoc/oracle9i/901_doc/text.901/a90122/toc.htm

Note that Oracle intermedia utilizes inverted index which is very fast for searching but not as fast for online indexing. In loose terms the inverted index requires that every searchable item is reindexed every time you need to update one of the indexed document.OpenFTS is much superior in that respect which makes it very good for web apps.

Neophytos, I'm up-to-speed on the interMedia stuff (a significant part of my paying job involves supporting interMedia for my customers). I guess what I'm asking is where do I find the details of what FtsEngineDriver needs to implement as it uses interMedia?
Great.  Thanks for the response. I had seen many alpha and pre-alpha releases of openfts in perl and had just wondered if the tcl version was still on the radar map.
Daryl, that's so good to hear (now everybody is gonna bug you with intermedia questions). Checkout /openacs-4/packages/search/sql/postgresql/search-sc-create.sql

You might want to checkout the openfts-driver which provides an implementation (for openfts) of the contract found in the referenced file.

Basically, you need to provide seven tcl functions that include:

* search, index, unindex, summarize (context of search terms -- not so important), or update the index for an item

* provide information for the driver

I'm not very familiar with oracle but AFAICS you might need these tcl functions to maintain an auxiliary table so that you can batch process them say every hour or so by using intermedia's api.

Hope that helps.

"I'm not very familiar with oracle..."

With oracle intermedia that is.

Neophytos, thanks for the pointer. I'll take a look and see how long it would take me to do this. Is anyone else working on this driver piece? I'd hate to duplicate the effort.
daryl,

i am not working on the interMedia driver yet, but i would need it in the next few days at the latest. if you think you can get it working by then that would be great, otherwise i'll have to write it myself but i don't have any interMedia experience. i would sure appreciate your help in either case though.

Collapse
Posted by Ryan Gallimore on
Hi,

OpenACS newbie here.

I've managed to install the FtsContentProvider service contract for the faq package, following the directions at the root of this post.

I can see the FtsContentProvider Binding to faq in /acs-service-contract/, but a search on my faq content, with old or recently added data, does not return any results.

Am I missing something? Do I need to repopulate the index somehow?

Here are links to my .tcl and .sql files used to setup the search.

http://ccri.dynamixsolutions.com:8000/review/faq-procs.tcl.txt
http://ccri.dynamixsolutions.com:8000/review/faq-sc-create.sql.txt
http://ccri.dynamixsolutions.com:8000/review/faq-sc-drop.sql.txt

Thanks,
Ryan

Ryan,

if you want to make old content searchable, you have to populate the search_observer_queue with an ad hoc query like this (not tested):

insert into search_observer_queue (
select entry_id, current_timestamp, 'INSERT'
from faq_q_and_as);

Thanks for your prompt response, claudio.

I've inserted those records into search_observer_queue but my searches on faq still come up with nothing. I also cannot search on new content added after I installed the FtsContentProvider service contract.

Do I need to restart the server? the database? How does search_observer_queue know what table I am refering, since I inserted only the entry_id?

Any help would be greatly appreciated.

Many Thanks

I know this is a tad off-topic, but one way of making a public HTML searchable is to become an adsense publisher and then use google adsense search feature. Of course, ads are shown with the search results...which sucks...but I enabled this search on my personal site (mainly to help myself find things) once atomz.com started serving ads too.

http://wolfram.org/search/

Eric

Might be useful in another case, but my client for this project wouldn't appreciate the ads.
Ryan,

when dealing with service contracts it's always a good idea to restart aolserver.

It is up to the search package, wich uses your contract service and particularly the faq__datasource proc, to scan the search_observer_queue.

If the queue doesn't get processed something went wrong and you should inspect your error log to find out what.

Thank you for your help.

However, I'm still having trouble with search on faq....

My error log recorded the following:

can't read "datasource(content)": no such element in array

But I can't see why the content element would not be populated by the answer column of the faq_q_and_as table.

I checked search_observer_queue, loaded the old content, and the FtsContentProvider for faq service is visible in /acs-service-contracts/

And search_observer_queue appears to be processing anything I insert into it (it is empty)

Any tips here would be appreciated.

Thanks, Ryan

This looks like the datasource function did not execute and hence it did not return the array. This can be possible if the "function is not found" in which case restarting the server can help. I just enabled search for bug tracker and ran into the same issue.
Thanks, Harish.

I tried restarting aolserver, but still faq content cannot be searched. It's been a week and this is starting to get to me. Anyone ever implemented search for FAQ?

Harish, could you post some of your code?

Thank you for all your help.