Forum OpenACS Q&A: Re: Approach to Google-optimizing

there are a few other things that I think originally hurt greenpeace..

1) redirects -- if someone links to http://www.greenpeace.org/, and it redirects to something else, google's engine used to treat the 302 as a 404 and then spider the resulting content. Not a huge problem until you realize that you get no PR transferred to the domain from the cache of links pointing at the site.

2) keyworded URLs. recently it seems that google is penalizing .php, .phtml, .shtml, .shtm as 'dynamic'. I've tested this numerous times with two clients and every time we check, the conclusion is the same. ? and & in the url are also dynamic triggers and one of my biggest pet peeves. Yes, if you can, put keywords in the directory path so that the pages have some chance at a higher relevence.

3) Http 1.0/googlebot requests without the Host Header. I don't know that google still does this, but they used to have a bot that would do checks without sending a host header. If I recall, greenpeace's website pointed surfers to a non-existent host when that happened.

Other notes:

cloaking. There are things that you can do that will help google, that are not specifically cloaking. Yes, they do have some bot that checks whether the page looks similar and contains similar elements, however, you can unfold menus, present navbars that allow google to spider more efficiently, etc.

content location. I've had a theory for many years that google seems to put more weight on the first 5120 bytes of a page. Thus, when you design a page that contains css, menus, headers and comments, etc, you are pushing the important page content 'lower' on the page to what google sees. This in turn affects the relevence to other sites.

keyword relevence. Google seems to take notice of particular phrases in the <a> container.

For instance, if you link to Nike as:

you bump the keyword relevence for Nike. However, a better keyword relevence might be:

<a href="http://nike.com/">Running Shoes</a>

A few other things I've learned along the way:
If at all possible, no inline javascript or css. Google will try to index it as content. Use Alt tags that represent what is in the picture (rather than alt="picture1")

404s are the devils bane. If you put content online, leave it there. Disk is cheap. :)

Just some random thoughts.