I have meant to open the discussion about this for a while, but it has taken me a while.
We are working on adding automatic document management tools to openACS applications. These tools are based on machine learning algorithms that classify documents automatically.
We plan to add this as a service in openACS so any package can use it.
What is it for?
If you have a lot of "documents" you often need to classify them. You can (and probaly should) ask users, but this is sometimes not possible or is innefiecient, specially if users do not understand teh ontologies used. For example, how many of us effectively use the classification in the bboard postings?, how about giving the best approximation if the user doesn't type any? Performance can often be of over 80% precision.
One first step was given by David Bell, who added our classification system to postgreSQL. This will work in a similar way to openFTS (for information retrieval instead of classification).
If you are interested, you can have a look at David's project in
Last years projects page.
or in
a first draft paper
or contact me