Forum OpenACS Q&A: Response to Search Engine cloaking / IP delivery

Collapse
Posted by Cathy Sarisky on
Google offers the cache (on the last line of each listing is a "Cached" link) regardless of whether or not the site is up or has new content.  The direct link to the site is certainly more obvious, and will probably be followed by the naive user over the cached link.  (The cached page includes a note that it may be stale and a link to the current content on your site.)  If you're getting requests for images without their corresponding pages, you might be seeing someone looking at Google's cache of a page (since it will grab images from your site normally).  Or of course someone might have linked to one of your images for use on their website.  Isn't bandwidth theft fun?

I see no sign of any attempt by Google to determine if your site is up before offering the cache.  (Really, would that be feasible?  It wouldn't be fast!)

As soon as you let caching robots skip registration, you are allowing any savvy user to do the same (at least for READING your content) by using Google's caching feature.  Old content which isn't available any more does seem to disappear from Google.  And content which is changed on my site does seem to turn up in changed form on Google in a month or less, as expected.