Forum OpenACS Q&A: Response to OpenACS polling webcrawler and confusing my local system.

It is trying to read a public list of known robots/bulk downloading
programs to load into the database.  It uses that list to block
attempts to download your entire site by such robots.  On a mature ACS
Classic or OpenACS installation, the sheer size of the questions and
answers in bboard forums, news, ecommerce pages, etc etc can be very
big.  When a bulk downloader takes a swing through your system, the
load on your system can grow to the point where in effect you're
suffering from a denial-of-service scenario (some of these downloaders
download your site by chasing links in parallel rather than serially).

And if you pay for bandwidth consumption, which is common at some
ISPs, this can also become expensive for the site owner.

So, short story is that the kind people at webcrawler.com publish a
list of robots which dive into sites and ignore the robots.txt file
that describes which parts of a site should and shouldn't be traversed
by a robot.  ACS/OpenACS use that list to protect your server.

You can turn this off but you'll have to dig through the code a bit to
find out where, my active servers and at-home machines have full-time
connections so I've never had to do so myself.

  There's also a spam (e-mailing) daemon that wants to run
periodically, but if you don't queue alerts or user messages it
shouldn't actually do anything other than peek at the database tables
looking for messages.