Forum OpenACS Q&A: Google does NOT index BBoard?
http://openacs.org/bboard/are not indexed by Google at all.
For example, Google for
You'll find all the
for everyone who posted to a thread with "TUX" in the title, and at
the bottom, Google will even give you a link to the a BBoard search of
But Google will NOT give you a link to the actual request for review: TUX and nsd document thread on BBoard!
Then try some variations on the above Google, search, designed to match exactly the BBoard thread itself:
The page with the Q&A style thread never shows up at all. Google's better at indexing than that, so Google must simply be ignoring the entire BBoard content entirely - all it's getting is the thread titles.
Anyone know exactly why this is, or given any thought to how to fix it for dev.openacs.org?
Maybe look into what .vuh files will do. Maybe you can turn
or something similar.
server fell over when it got spidered.
I've had Google request over two thousand pages in under three minutes before. Sometimes 30 Googlebots grab stuff at one time. Googlebot is a very unfriendly spider for sites with lots of pages but limited hardware. Also, dispite its ability to grab urls with query vars, it seems to lose interest in a site without grabbing everything. I wonder if it limits depth, maybe using the number of query vars as a substitute for depth.
Search engines think that "plain HTML" URLs are usually authored by hand and are better literature to search for.
Also, URLs without parameters tend to be shorter are more human-readable. I can understand that
http://dev.openacs.org/forums/forum-view?forum_id=14013makes sense to those who program, but to the rest of users
http://dev.openacs.org/forums/OpenACS/or at least
http://dev.openacs.org/forums/1/would look way more logical.
The other advantage is that you get a URL that you can actually type that takes you directly to where you want to be, instead of to a page where you have to click to get to where you want to be.
Do you have some more info on your idea?
I would be interested in doing something like that.
This particular problem could be solved by allowing admins to move a message to another forum that sits at the same level in the site map. That would be easy to implement, and we'd have status quo.
The tricky part is when you want to move to another instance in another place in the site map, because you have to show the context. Say they're all called "Forum", but one sits below "Project A" and another below "Project B". You'd have to include that context for people to figure out which one they want.
This is not a super-complex problem by any means. It's just not *quite* as trivial as when we're staying within one context.
Maybe that change alone would get our BBoard threads indexed by Google. But maybe not. And even if it does, there is no guarantee that Google - or any other search engine - will continue to handle URLs the same way they do right now.
Ideally, anything that we want to look like static content to a search
engine should be presented with a static-looking URL. E.g., no
& in the URL, query variables are
instead embedded implicitly between
Now, if you have that new "static content" URL scheme working for, say, BBoard, it makes sense to use only the new "static content" URLs for BBoard. There's no good reason to have the human users see one sort of URL and search engine robots see another, because if they do, then when a person manually links to something from their homepage, they'll be using the URL format that the search engine doesn't like - not good.
Whatever this tool to eliminate query variables from the URL is, should it not be powerfull enough to use for the entire toolkit if we so desire, even if we choose to use it only in certain specific targetted applications? Has this been discussed/designed before?
Some posts above imply that, at least for Google, that may not actually be true, or may only be partially true.
Whatever else, other than the URL format, impacts whether or not Google indexes stuff (like the server maybe falling over under massive Googlebot load), is at least (probably more) important than the URL format issue itself.
But if it's likely that the URL will always be at least a piece of the puzzle, it would be very nice to have a tool to present whatever URL format we want, and that's worth discussing.
For the record, I wasn't concerned about the google thing at all here, just talking about the single-forum vs multiple forums issue.
I think ridding the URLs of some of the query vars, like you're talking about, is a good thing. It's been discussed before. It's just one of those things that a patient and thorough hacker with enough time on his or her hands needs to go ahead and get done :)
Maybe it does this as a precaution against "infinite" URLs. For instance, imagine crawling through our very own calendar application. If you followed all the links, it would never end since you would be crawling through all the years (2003, 2004, ...)
That's my theory anyways. =)