You're right, Michael, it's very easy to get carried away with the
promise of KM and AI and be very disappointed by the results.
I don't know if mimicking Northern Light is the first place to go,
here, in part because of its "black box" nature. One of the
reasons I don't like NL is that I don't understand how it works
because of all the outliers in the results. The auto-generation of
a taxonomy is more interesting, but suffers from some of the
same problems.
A way to generalize your first two points is to say that they are
"point of entry" problems, which is to say, "How does a user
know what's in the collection to search for?" Taxonomies (like
Yahoo!) are too specific for this purpose while at the same time
being too ambiguous because of the arbitrary nature of the
terms chosen for the top level of the hierarchy. However, a
collection of phrases and keywords grouped into the smallest
possible number of headings would be interesting, in that it
could describe, in very general terms, what's in the collection.
These could act as starting points for exploration.
Another useful project is meta-organization, where individual
documents or data-objects can appear in all the places where
people are likely to want to find things.
None of this is truly interesting without being "aware" of the way
the system is being used, in order to be able to transfer
knowledge about the contents of the collection into the metadata
cloud about the collection so that the system gets smarter the
more it is used.
I've generalized what I'd like to accomplish in the following way:
here are really several main things I want to achieve with an
integration project:
1) Query optimization through improved contextualization
(determining the knowledge domain of the query) before
submitting it.
2) Improved presentation of query result sets through chunking
results to search context(s) (organizing the presentation of
search results according to the possible context(s) of the query).
3) General knowledge discovery (metadata extraction) in
"document" respositories (generate metadata about what's in a
database and/or collection) in support of the above two needs.
4) Notification using implicit and explicit profiles supported by
multiple means of communication (let people know when
something new of interest is added to the db, by e-mail, pager,
SMS, etc.)
These four general tasks can be combined, I think, to do pretty
much anything I can conceive of needing to do with the system.
The more I talk about this the more interested I am in working on
a general KM framework layer on top of OpenACS facilitated by
OpenCyc. When can we start?