Malte, I have never used any lexical/text categorizing tools myself.
Several I've heard of are the code discussed in Paul Graham's
A Plan for Spam,
CRM114,
and
bogofilter.
And Graham lists many other
open source Bayesian filters.
Most of those seem to have been used so far primarily or only as spam
filters, but there have definitely been other applications (Extracting
the
interesting posts from Usenet,
for example.)
Someone or other here also wrote a college thesis doing automatic
classification of text in an OpenACS system, but I never read it and
now I don't remember who that was or where it is.
There was also some old
discussion of OpenCyc,
which is of course quite different.