Forum OpenACS Q&A: Response to Search Engine cloaking / IP delivery

Collapse
Posted by Malcolm Silberman on
Of cource ACS has Robots detection. http://serverspace.com/doc/robot-detection , however I fear that relying on the easy to "fake" USER_AGENT variable could be a problem.
Allan Regenbaum kindly sent me a repair to robot-detection;


repair of robots facility on 3.x

couple changes required to make robots work ...

first .. osme useragents are too long so ..

SQL> alter table  robots modify ( robot_useragent varchar(200));

second, the call to get the file in /tcl/ad-robot-defs.tcl
ad_replicate_web_robots_db  needs to change from

    set result [ns_geturl $web_robots_db_url headers]

    set result [ns_httpget $web_robots_db_url]


The new URL to get a list of robots has changed per the response to Malcolms
post...

In your service.ini


[ns/server/yourservername/acs/robot-detection]
; the URL of the Web Robots DB text file
WebRobotsDB=http://www.robotstxt.org/wc/active/all.txt    <<< this is the
new URL
; which URLs should ad_robot_filter check (uncomment to turn system on)
 FilterPattern=/ecommerce/*                  >>> will cause a robot check on
any vist to /ecommerce (as an example)
; FilterPattern=/members-only-stuff/*
; the URL where robots should be sent
RedirectURL=/robot-heaven/                   <<<< create this directory with
pages which suit the robots
; How frequently (in days) the robots table
; should be refreshed from the Web Robots DB
RefreshIntervalDays=30