Forum OpenACS Q&A: Re: url abstraction

15: Re: url abstraction (response to 1)

Posted by Don Baccus on 09/25/03 12:19 AM

I implemented a URL abstraction facility in the request processor for Greenpeace Planet at their request. We don't use it because in the two-three years since the project was planned Google has come to dominate search engines and it happily chases links with query variables in them (the primary motivation for removing them from URLs)

So, my first question - do people feel abstract URLs are important enough to support given that Google has raised the bar for search engines? (I can't imagine any search engine technology out to beat Google not following URLs with query vars, how would they then compete?)

Secondly ... the way we did it in Planet was to encode everything to the right of "http://greenpeace.org". The request processor decodes the URL transparently so packages are unaware that they were encoded. Packages call a central routine to compute the URL - there's no way to make this transparent to existing code.

If you did want abstract URLs, is there any reason https://openacs.org/forums/forum-view/14013 is superior to https://openacs.org/some-magic-number?

16: Re: url abstraction (response to 15)

Posted by Dirk Gomez on 09/25/03 02:07 AM

https://openacs.org/some-magic-number is vastly superior to https://openacs.org/forums/forum-view/14013, because it will survive site-map renaming adventures.

We already had that discussion, hadn't we?

My two cents: if we go for URL abstration let's shoot for "some-magic-number" and give the user the option to include the locale: https://openacs.org/de_AT/some-magic-number might just be different from https://openacs.org/de_de/some-magic-number might

17: Re: url abstraction (response to 16)

Posted by Tom Jackson on 09/25/03 10:53 PM

Assuming the transformation between "forums/forum-view/14013/" and "some-magic-mumber" is reversable, you will still run into a problem whenever you decide to change any of the url, such as the mount point. Any change would still require some kind of mapping to track the change.

As long as the magic number is only calculated in one place, the problem is minor, however what if you want links from one page to another, then "../" which could serve as link location has to be normalized and transformed. Any page providing a list of links will need to do multiple transformations, and these may need to be done in an expensive way, running db_foreach or db_multirow with a script block.

But for flat navigation structures, removing a bunch of ugly query vars, or making a tiny url, I like the concept very much. Both methods are possible with OpenACS, and the choice is probably going to remain application dependent. The transform could also only apply to _a part_ of a url, allowing you to condense several query vars with values into one neat string.

19: Re: url abstraction (response to 15)

Posted by bill kellerman on 09/26/03 12:14 AM

argh... i wrote this once already... here we go again.

search engines are becoming more comfortable with querystrings, so that alone would not be enough of a motivation for abstract urls. however, i see it as expanding on the logic of removing file extensions from filenames.

the issues seem to be:

- defining a standard structure: what is the safest, most portable method? vignette's storyserver has done this for years with the 0,283,1687_1919_304,1.html name format, allowing a cheap and easy way to cache files on the filesystem. maybe that idea could be moved into a directory name. you'd just have to take into account future changes in the definition.

- where to perform the actual translation from url to variables: i'm getting more familiar with the oacs, but i'm still way retarded on the code. i'd hope a preprocessing filter could be defined which translates everything to the right of the location using the definition from #1 -- which is i think what you said don. existing code couldn't take advantage of it because of direct calls to ns_conn headers -- they'd have to be modified (which is a big deal). once it's modified to call some get_querystring_vars proc, it shouldn't care if the proc had to retrieve them from a regular or abstracted url.

- is it worth the complexity, and does anybody care? abstract urls could make transitions between backends easier, make static site mirroring easier (running "dynamic" content without actually needing the database, especially in failover emergencies), make prettier and more easily remembered urls for users.

i read the forum at http://foo.org/forums/message-view?message_id=75763 in regards to magic numbers. unless someone can come up with a good solution, there seems to be too much complexity.

i do like tom's "tinyurl" suggestion. the key point is that a system handling abstract urls should still be able to handle regular urls. resources can be mapped to abstractions, and within pages you could call a proc like make_appropriate_hyperlink. if abstraction is turned on, the proc generates the defined url format otherwise regular.

really, i need to get more comfortable with how the system works so i can actually have more useful input. thanks for indulging me on this.

22: Re: url abstraction (response to 19)

Posted by bill kellerman on 09/26/03 12:25 AM

"...if abstraction is turned on, the proc generates the defined url format otherwise regular."

didn't explain what i'm talking about very clearly.

in order to try and manage internal links between resources and what type of url to use:

the site i work has internal links that change often. i'd like a way to store internal and external links with a map to the content itself. so, if i call a proc make_appropriate_hyperlink $pageid (or whatever identifying scheme and structure), it would return http://foo.org/mydirectory/mypage?myid=666. if the url changes, i change it in the map, don't need to edit any html pages and don't have to set up redirects (except for user bookmarks).

so, the make_appr_hyperlink can check if abstraction for that subsite is on. if so, it would instead return http://foo.org/mydirectory/mypage/666.

the content mapping would be handled anyway, the abstraction is just an addition, and the pages don't know how or where they got their querystring values from -- just that they got them.