Forum OpenACS Development: Searching dynamic content by date range. Non-sequitur?
AOLserver does the cheap and mostly correct thing: for adp pages (tcl pages too?) the current time and date is reported as the last modified time and date; while html pages have the actual last modified time and date as recorded in the file system returned.
Our bboard metaphor makes it hard, because, well, what is the date of a thread? Is it the date of the first post, or the date of the last post? It's really a date range.
So what's the answer?
Is the search by date metaphor just meaningless in a world of dynamic content?
Should we formally support an ACS interface so that each page can set it's last modified date time if it needs/wants to?
In an application like bboard, what should the last modified date be set too?
Should content assemblies like bboard have a special search indexing engine mode (presumably useful to more than just htDig but unknown) that can expose individual elements when they each have a meaningful last modified date time that would be of interest to folks searching a site?
What kind of an interface would you like to see?
Perhaps we need each module to expose its content to a search in a certain format, and each module can decide what and how to offer that content. So we need an two sided search API. The search mechanism needs an API so a module can search the database. And another API for the module to offer up content to the indexer. Unfortunately I have no idea how to actualy build this.
is a useful way. They define triggers on each table that they want
to have searched. Any time data is inserted, updated or deleted
from one of those tables, these triggers perform the same action
on the search table. So when it comes time to search, you only
have to do it in one (indexed) table. There are support tables
which supply information such as how to construct links for hits
on data from a particular table.
This makes it fairly trivial to control what gets indexed and what
doesn't. The drawback is duplication of data, of course, but
personally I'm ok with that since the alternative is trying to "teach"
an indexer about the structure of your particular database - Not
Conceptually, I think what's needed is to take a sophisticated
search engine like htDig and translate it's search algorithm into
SQL. So instead of generating a regexp or whatever it does now
to turn your query into something to be executed, it would have to
write SQL instead. Then it shouldn't be *too* hard to run that
SQL against the database and return the results. Of course I'm
probably overlooking something horrendously complicated or
this would have been done already! :)
BTW I agree with Dave, searching static pages just isn't good
enough for the kind of sites we are building here. Not to mention
that writing all the dynamic pages to disk represents even more
duplication than the search table method does!