Forum OpenACS Q&A: Search Engines and bboard postings

Collapse
Posted by Bob OConnor on

When search with Google say on "openacs" I get results, yes, but NO results from bboard posts. I assume this is because Google and other search engines do NOT search for dynamic content or specifically URL VARs.

Yes, I know one can use the search here on the site but having it as part of the search engines would allow for more exposure to the big world. I am amazed when I use Google to search for the answer to some problem and it often forum posting with the answer is returned.

OPEN ACS posts have urls like

https://openacs.org/bboard/q-and-a-fetch-msg.tcl?
msg_id=0000LN&topic_id=11&topic=OpenACS

So, what about a module (or proc) that builds a directory tree of threads? It would look like:

openacs.org/bboard/msg/0000LN/

And in the "0000LN" directory would be an index.tcl file that would do a redirect to the long url above.

The module would create this new directory and index.tcl file every time a new thread is started.

Feedback please!

-Bob

Collapse
Posted by Dave Bauer on
Bob, Its already in there. See doc/robot-detection.html in your OpenACS docs
Web Robot Detection
part of the ArsDigita Community System by Michael Yoon 
-------------------------------------------------------

User-accessible directory: none 
Site administrator directory: /admin/robot-detection/ 
Data model: /doc/sql/robot-detection.sql 
Tcl procedures: /tcl/ad-robot-defs.tcl 
The Big Picture

Many of the pages on an ACS-based website are hidden from robots 
(a.k.a. search engines) by virtue of the fact that login is required 
to access them. A generic way to expose login-required content to 
robots is to redirect all requests from robots to a special URL that 
is designed to give the robot what at least appear to be linked .html 
files. 
You might want to use this software for situations where public (not 
password-protected) pages aren't getting indexed by a specific robot. 
Many robots won't visit pages that look like CGI scripts, e.g., with 
question marks and form vars (this is discussed in Chapter 7 of 
Philip and Alex's Guide to Web Publishing). 
Also I believe that Google does index URLs with varaiables in them anyway.
Collapse
Posted by Ola Hansson on
Philip Greenspun wrote about this in a chapter of his book: http://www.arsdigita.com/books/panda/publicizing.

He publishes some procedures to overcome what you point at in that chapter and I have changed them to suit PostgreSQL. I suppose the db will be hit quite hard when a robot finds your site though. I took the liberty of placing the file here...

Collapse
Posted by Bob OConnor on

Thank you! I see that Google does index some URL vars but in my brief scan using "openacs" as the search, I found NO bboard entries.

Ola, good code! I'll probably rewrite to use
ns_return 200 text/html $whole_page
instead of
ns_write.
I'll post my results...

-Bob