Forum OpenACS Q&A: Re: Google & Co on dynamic content

Collapse
Posted by Tom Jackson on

Jerry: great idea about mangling the email addresses (where they show up). Probably your proposed metadata api for AOLserver would be useful for that purpose. Personally I would like the jpeg solution, although that leaves non-visual/non-graphic UAs at a disadvantage.

I have run a large dynamic site for a number of years (saleonall.com). This is based on the ecommerce package, but I have moved the navigation and display of product information to a 'static looking' setup. Also, the parts that should not be visited by robots are in separate directories, so they are easy to exclude. Look at robots.txt .

Still, Google fails to index the entire content. However they do index some of the site, closer to the top of the directory structure, probably around 10-20k pages. When they do, they send in an army of 30 or so bots, each one more brain dead than the last. Because of this I actually create a static copy of the 150k or more pages and give those to GoogleBot.

I think I am going to create a shallow (by product_id) hirearchy so Google will be happy. Probably products will be displayed in a url like http://saleonall.com/cat/01/012345/product.html . This way the entire site is within two hops of an index page. The 00-99 pages could be built each time the database is turned over so they are static.