Forum OpenACS Q&A: Re: Approach to Google-optimizing

Collapse
Posted by Tom Mizukami on
I remembered Eric having some good experience at getting OACS sites a high Google ranking. https://openacs.org/forums/message-view?message_id=107569
Collapse
Posted by Joel Aufrecht on
Of course we don't want to do any cloaking. The quality of the content and whether or not people are linking to it is a bit out of scope of my core mission, which is to identify and solve any technical glitches that cause poor indexing. I've got Eric Wolfram's top item, fixing page titles, on the top of my list.

After reading all the notes and some of the linked items, I'm wondering:

  • Is it worth it to try and monitor results? greenpeace.org gets many google hits every day, but not every page is hit every day, and some authors claim that pages go months without re-indexing. Maybe we should just make the obvious fixes and leave it alone, or check back in 6 months.
  • Should I put any effort into better pretty urls - not just /article/145 but putting a keyword into the pretty url? We do the foundation work for this in some parts of openacs, where short_name is a locally unique string suitable for a url. This is nicer for users, certainly - how standard can we make in OpenACS? Is it worth trying to retrofit this to old apps that just have ids, by creating a short-name field and populating it?
  • Where else should we be setting noindex,nofollow? So far:
    • in edit and add mode of form-builder
    • in packages with duplicates. Are the duplicates a bigger problem then the possibility of not getting indexed at all if we block some pages from indexing and the "intended to be indexed" pages don't get hit? Maybe we're better off trusting the search engines' ability to hide duplicates.
Collapse
Posted by Jeff Davis on
I think the reason we should be more careful about noindex,nofollow is this statement from google:
1. Reasons your site may not be included.
Your pages are dynamically generated. We are able to index dynamically generated pages. However, because our web crawler can easily overwhelm and crash sites serving dynamic content, we limit the amount of dynamic pages we index.
So since they limit the amount of dynamic content they spider you want to make what they do spider unique to increase coverage (and to lower the burden on your own server).

Changing everything to have pretty urls would remedy the spidering scope issue (although it would still leave google pulling down an order of magnitude too many pages for things like bug tracker).