Forum OpenACS Q&A: url abstraction

1: url abstraction

Posted by bill kellerman on 09/21/03 12:51 PM

I know this has been asked a lot, but so far I haven't been able to find a firm answer.

To what extend does the openacs support url abstraction, in that a url such as:

https://openacs.org/forums/forum-view?forum_id=14013

would be in some form of:

https://openacs.org/forums/forum-view/forum_id=14013

If it doesn't exist, why doesn't it? Would it in future versions? Why not?

Could a switch be incorporated into the kernel (along with the actual functionality) to turn on/off url abstraction for a site? Or is url abstraction mainly a philosophy and opinion issue?

Yet again, thanks for providing me some direction as I hammer you with questions -- I'm trying to match my requirements to what is available so I know what I'll have to build.

2: Re: url abstraction (response to 1)

Posted by Tom Jackson on 09/21/03 09:25 PM

Probably better would be /forum-view/12345/. Otherwise you need to write a parser to get to the 12345 part. Then you use an index.vuh file, grab the id like so:

set forum_id [lindex [ad_conn urlv] end]

while {[set index [ns_set ifind [ns_conn form] forum_id]] > -1} {
    ns_set delete [ns_conn form] $index
}

ns_set put [ns_conn form] forum_id $forum_id

rp_internal_redirect forum-view

I've set this script up for testing. Try sticking ?forum_id=3456 on the end. It will remove this from the form and replace with the value 12345.

4: Re: url abstraction (response to 1)

Posted by Malte Sussdorff on 09/22/03 12:00 AM

Together with the category system there is the /o/ handling for URL abstraction (in a way). Basic idea: http://www.yoursite.com/o/12345 will display you the Object which has an ID of 12345. If this is a forum posting, the forum package would take care of the display.

Ask Timo or Dirk for details, search in the forums, as I know this has been discussed (somewhere). Or just follow Tom's advise ;).

3: Re: url abstraction (response to 1)

Posted by Tilmann Singer on 09/22/03 12:36 AM

1. The term url abstraction is used in another sense in the openacs docs, as they use it to name the system of hiding the file extension in the url (/bla/message-post instead of /bla/message-post.tcl) This is used in all of openacs and it is quite cool. I'm just mentioning this to prevent (mostly my own) confusion.

2. I think it can't and shouldn't be done in a general way that works for all packages, but rather needs to be implemented in an application specific way for each package, like Tom's example for forums. There's also a running example in bugtracker.

3. Some minor remarks to Tom's example: you can also use [ad_conn path_info] instead of ad_conn urlv. And instead of ns_set put ... there is the handy rp_form_put proc (with which the deletion loop would also be obsolete I think).

5: Re: url abstraction (response to 3)

Posted by Tilmann Singer on 09/22/03 12:53 AM

An important aspect of this is also that when a fancy url like this is enabled, it should also be changed in all places where the url is used within the package. E.g. after you created the possibility to say /forum/12345 you should use /forum-view?forum_id=12345 as seldom as possible (from within the package and other places of the toolkit) for consistency. Important among other reasons to not disturb the browser feature to display already visited links in a different colour.

Malte, the /o/12345 stuff is useful for building lists of objects of mixed type, where the computation of the url should be delayed, but it should in my opinion not be used for package internal links, because it results in an unnecessary http (external) redirect.

6: Re: url abstraction (response to 3)

Posted by Tom Jackson on 09/22/03 01:11 AM

You need to delete all the forum_id query variables, then add the one you need. rp_form_put just appends a new query variable to the form. It puts it on the end, so the first value will be the original, if any.

For instance try the following link, which is setup like this:

ad_page_contract {
        @author tom jackson
}

set forum_id [lindex [ad_conn urlv] end]

rp_form_put forum_id $forum_id

rp_internal_redirect forum-view

rp_form_put test

The query variable is just appended, leading to the error message. You can't use replace, because this is equivalant of delkey and put. If there are more than one forum_id variables, the wrong one might be used. Also, replace is case sensitive, whereas the loop I'm using is case insensitive.

7: Re: url abstraction (response to 3)

Posted by Tom Jackson on 09/22/03 01:18 AM

For the url '/test/12345/', [ad_conn path_info] returns '12345/', so that will not work.

8: Re: url abstraction (response to 1)

Posted by Tilmann Singer on 09/22/03 01:29 AM

Am I right in the assumption that this example is highly hypothetical?

Why would the code need to be able to deal with a case like this:

/bla/forum/12345?forum_id=67890

when all requests of the new form will be like this:

/bla/forum/12345

and all references in the (to avoid) old style like this:

/bla/forum-view?forum_id=12345

Thanks for the note on [ad_conn path_info] - that's right. I'd propably still use it rather than [ad_conn urlv] and treat it with [file split] before because it'll be easier to read when including the message_id, like /bla/forum/12345/54321, but maybe that's just meaningless personal preference.

9: Re: url abstraction (response to 1)

Posted by Dave Bauer on 09/22/03 03:11 AM

Index.vuh is the easiest solution here.

Also I would like to suggest to avoid using object_ids where you don't have to.

For example

https://openacs.org/forums/forum_id=14013

https://openacs.org/forums/openacs-general/

You'd probably still have to use the message_id as the tail of the url like this:

https://openacs.org/forums/openacs-general/12345

Forums doesn't support this yet, but when I get a chance, I would like to make this change.

Using the content-repository, this is easier, because cr_items.name which is often used to build a URL is unique when combined with parent_id, which is often a cr_folder, so you end up with URLs like

http:///example.com/folder-name/item-name

If an object doesn't have a usuably pretty name, just use the object_id as the name, guaranteed to be unique, and you end up with a result similar to the forums message example.

10: Re: url abstraction (response to 8)

Posted by Tom Jackson on 09/22/03 09:39 AM

The reason I use [ad_conn urlv] is it contains exactly the information I need, already parsed. Why make a mistake in parsing when the work is already done?

The reason I included the highly hypothetical example is because of the mistake you can make by just appending, blindly to the form. In the case covered, the user is left wondering what the error means. Once a page hits the internet the highly hypothetical becomes the all too common.

Also, there may be other variable you need to add to a form before the internal redirect. This is an example of how to clean up your form so it contains what you want it to contain.

11: Re: url abstraction (response to 10)

Posted by Tilmann Singer on 09/22/03 11:12 PM

Totally agree with Dave regarding using pretty names instead of object_ids if possible. If you need to generate a string that can be used as url part see the proc util_text_to_url in case you don't know it yet.

About the parameter delete loop: looks like a proc: rp_form_clear would be handy, no? Or should rp_form_put optionally replace existing params instead of appending? Didn't we have that discussion already somewhere?

Using [ad_conn urlv] starts to become tricky when you have possibly more than one element as part of the url information you need, e.g. when you want to both serve https://openacs.org/forums/openacs-general/12345 (the message) and https://openacs.org/forums/openacs-general/ (the forum-view).

12: Re: url abstraction (response to 11)

Posted by Tom Jackson on 09/22/03 11:54 PM

rp_form_put does what it should. I don't see any reason it should change, but it would be nice to provide a proc to handle the case of clearing, or setting a unique value for a form var.

Without knowing more, I'm not sure exactly what method I would use if I had multiple target pages. I think my example speaks for itself. I'm not trying to write an application page for the forum package, just provide an example for the question asked. I will say, however, that the information in a url path is contained in the sequence and values of what is between the '/' characters. [ad_conn urlv] provides exactly this information.

Here is a more complex example taken from /www/skills/standard/index.vuh on iUnicycle.com:



set urlv [ad_conn urlv]

set path_value [lindex $urlv end]

switch -exact -- "$path_value" {

    riding - mount - transition - stationary {
        set template_name "type"
    }
    standard {
        set template_name "standard"
    }

    default {
        set standard_index [lsearch $urlv "standard"]
        set skill_list [lrange $urlv [expr $standard_index + 1] end]
        if {[llength $skill_list] == 1} {
            set template_name number
        } elseif {[llength $skill_list ] == 2} {
            set template_name letter
        } else {
            set template_name $path_value
        }
    }
}

rp_internal_redirect $template_name

13: Re: url abstraction (response to 1)

Posted by bill kellerman on 09/24/03 08:36 PM

so there is no formal treatment of url abstraction in the openacs?

i can definitely take what you've told me and build a custom solution, but i'm wondering what everyone thinks of the idea at all? why wouldn't it be in the oacs?

the library i work at manages a huge amount of electronic resources in a database. i'd like to easily be able to use a file cache/static mirror of the site, and http://library/resources/id/30/format/50245/ (or some format along those lines) would seem to be a good way to do that? ...in addition to getting rid of ugly querystrings and not breaking search engines.

it would seem to me, off the top of my head, you could build a package with procs to do the url -> querystring translation (according to your defined url format), then make an on/off switch set on subsite instantiation which inherits it's value from the parent subsite. you could use those same procs to generate proper urls for hyperlinks (regular query string or abstracted url according to the subsite's abstraction setting) within a page. in case a page is for some reason bookmarked or hyperlinked from another page with a regular querystring, the abstraction procs would still receive the proper variables:

http://library/resources/id/30/format/50245/
and
http://library/resources/?id=30&format=50245

would generate the same page.

...just wondering out loud.

14: Re: url abstraction (response to 13)

Posted by Tom Jackson on 09/24/03 09:42 PM

Yes, you could setup a page which takes query variables, and still use an index.vuh file to grab the information from the url, then use rp_internal_redirect to the regular page. You would not necessarily need to put the variable names in the url you could use http://library/resources/30/50245/ for instance. The point is you can do it any way you want, there are no limitations.

15: Re: url abstraction (response to 1)

Posted by Don Baccus on 09/25/03 12:19 AM

I implemented a URL abstraction facility in the request processor for Greenpeace Planet at their request. We don't use it because in the two-three years since the project was planned Google has come to dominate search engines and it happily chases links with query variables in them (the primary motivation for removing them from URLs)

So, my first question - do people feel abstract URLs are important enough to support given that Google has raised the bar for search engines? (I can't imagine any search engine technology out to beat Google not following URLs with query vars, how would they then compete?)

Secondly ... the way we did it in Planet was to encode everything to the right of "http://greenpeace.org". The request processor decodes the URL transparently so packages are unaware that they were encoded. Packages call a central routine to compute the URL - there's no way to make this transparent to existing code.

If you did want abstract URLs, is there any reason https://openacs.org/forums/forum-view/14013 is superior to https://openacs.org/some-magic-number?

16: Re: url abstraction (response to 15)

Posted by Dirk Gomez on 09/25/03 02:07 AM

https://openacs.org/some-magic-number is vastly superior to https://openacs.org/forums/forum-view/14013, because it will survive site-map renaming adventures.

We already had that discussion, hadn't we?

My two cents: if we go for URL abstration let's shoot for "some-magic-number" and give the user the option to include the locale: https://openacs.org/de_AT/some-magic-number might just be different from https://openacs.org/de_de/some-magic-number might

17: Re: url abstraction (response to 16)

Posted by Tom Jackson on 09/25/03 10:53 PM

Assuming the transformation between "forums/forum-view/14013/" and "some-magic-mumber" is reversable, you will still run into a problem whenever you decide to change any of the url, such as the mount point. Any change would still require some kind of mapping to track the change.

As long as the magic number is only calculated in one place, the problem is minor, however what if you want links from one page to another, then "../" which could serve as link location has to be normalized and transformed. Any page providing a list of links will need to do multiple transformations, and these may need to be done in an expensive way, running db_foreach or db_multirow with a script block.

But for flat navigation structures, removing a bunch of ugly query vars, or making a tiny url, I like the concept very much. Both methods are possible with OpenACS, and the choice is probably going to remain application dependent. The transform could also only apply to _a part_ of a url, allowing you to condense several query vars with values into one neat string.

18: Re: url abstraction (response to 1)

Posted by bill kellerman on 09/25/03 11:42 PM

okay...

i know i replied to don last night and the reply isn't showing up. this has happened a few other times in various replys/posts i've written.

i know how to submit a forum reply, i'm not an idiot. has this happened to anyone else, or have i gone nuts??

20: Re: url abstraction (response to 18)

Posted by Jade Rubick on 09/25/03 11:52 PM

I post more than anybody here (see https://openacs.org/forums/recent-posters -- I'm neck in neck with Dave), and I haven't noticed postings getting lost. Occasionally there are weird error messages, but even then, things go through usually.

But I'm not saying you're an idiot :-) You might just have gone nuts...

21: Re: url abstraction (response to 1)

Posted by Tom Jackson on 09/25/03 11:57 PM

Shoot, well I don't think it happens too often. Can you repeat the gist of your comments?

Oh, I just wanted to add that if you are trying to represent database rows as a hierarchy, then it is useful to have this reproduced in the url. But for a lot of applications, the magic-number idea is very useful.

Don, how does this work exactly? For instance tinyurl.com uses a database to reverse the map. Kind of expensive, but very useful, since you need some tiny string.

19: Re: url abstraction (response to 15)

Posted by bill kellerman on 09/26/03 12:14 AM

argh... i wrote this once already... here we go again.

search engines are becoming more comfortable with querystrings, so that alone would not be enough of a motivation for abstract urls. however, i see it as expanding on the logic of removing file extensions from filenames.

the issues seem to be:

- defining a standard structure: what is the safest, most portable method? vignette's storyserver has done this for years with the 0,283,1687_1919_304,1.html name format, allowing a cheap and easy way to cache files on the filesystem. maybe that idea could be moved into a directory name. you'd just have to take into account future changes in the definition.

- where to perform the actual translation from url to variables: i'm getting more familiar with the oacs, but i'm still way retarded on the code. i'd hope a preprocessing filter could be defined which translates everything to the right of the location using the definition from #1 -- which is i think what you said don. existing code couldn't take advantage of it because of direct calls to ns_conn headers -- they'd have to be modified (which is a big deal). once it's modified to call some get_querystring_vars proc, it shouldn't care if the proc had to retrieve them from a regular or abstracted url.

- is it worth the complexity, and does anybody care? abstract urls could make transitions between backends easier, make static site mirroring easier (running "dynamic" content without actually needing the database, especially in failover emergencies), make prettier and more easily remembered urls for users.

i read the forum at http://foo.org/forums/message-view?message_id=75763 in regards to magic numbers. unless someone can come up with a good solution, there seems to be too much complexity.

i do like tom's "tinyurl" suggestion. the key point is that a system handling abstract urls should still be able to handle regular urls. resources can be mapped to abstractions, and within pages you could call a proc like make_appropriate_hyperlink. if abstraction is turned on, the proc generates the defined url format otherwise regular.

really, i need to get more comfortable with how the system works so i can actually have more useful input. thanks for indulging me on this.

22: Re: url abstraction (response to 19)

Posted by bill kellerman on 09/26/03 12:25 AM

"...if abstraction is turned on, the proc generates the defined url format otherwise regular."

didn't explain what i'm talking about very clearly.

in order to try and manage internal links between resources and what type of url to use:

the site i work has internal links that change often. i'd like a way to store internal and external links with a map to the content itself. so, if i call a proc make_appropriate_hyperlink $pageid (or whatever identifying scheme and structure), it would return http://foo.org/mydirectory/mypage?myid=666. if the url changes, i change it in the map, don't need to edit any html pages and don't have to set up redirects (except for user bookmarks).

so, the make_appr_hyperlink can check if abstraction for that subsite is on. if so, it would instead return http://foo.org/mydirectory/mypage/666.

the content mapping would be handled anyway, the abstraction is just an addition, and the pages don't know how or where they got their querystring values from -- just that they got them.

23: Re: url abstraction (response to 1)

Posted by Dirk Gomez on 09/26/03 01:32 AM

Have you guys looked into this thread? https://openacs.org/forums/message-view?message_id=94374

Tom, we have a central table in OpenACS that makes the reverse lookup possible. It's called acs_objects :)

25: Re: url abstraction (response to 1)

Posted by Don Baccus on 09/26/03 01:42 AM

Derek ... in order to minimize the amount of code rewriting necessary at Greenpeace, the decoding is done before the request processor sets up ad_conn. The encoding is just of the "ugly URL" ...

We took the simplistic approach because the intention was just to trick mindless spiders which don't chase dynamic URLs into chasing dynamic pages on the site.

24: Re: url abstraction (response to 23)

Posted by Tom Jackson on 09/26/03 01:47 AM

Dirk, yes I've read it. I wrote about half the text in that thread. The acs_objects table is not a url mapping table, at least last time I checked. I thought this discussion was about actual pages on a website, or virtual pages, whatever. The other thread you mention is about displaying acs_objects out of context of a regular page.

26: Re: url abstraction (response to 1)

Posted by Dirk Gomez on 09/26/03 10:00 AM

Nonono, no out-of-context display.

And why is acs_objects not an url mapping table. Have a look at Carsten's (excellent) page, he's partly using this trick: https://web.archive.org/web/20120902082225/http://www.clasohm.com:80/whats-new

27: Re: url abstraction (response to 1)

Posted by Dave Bauer on 09/26/03 03:01 PM

Dirk, That's cool! How does it work? Did someone actually write the URL service contracts or what?

acs_objects does not anywhere define where an object is in URL space. OpenACS is not Zope, and it does not define how a package maps objects to URLs. If everything was in the CR and we have one folder hierarchy, we could do it, but OpenACS doesn't define any object/URL mapping itself.

Yeah, I know you already know this.

28: Re: url abstraction (response to 1)

Posted by Dirk Gomez on 10/01/03 10:59 AM

I don't really know how it works on that site. Maybe Carsten can tell us? I'm
pretty sure that he calculates the url straight-forward from the object_id.

Re object/URL mapping: I think we don't have it yet in stock
OpenACS, but the "/o" trick would provide one - that is good enough for me.

Is someone familiar with the acs-service-contract package? I'd be volunteering
to code the whole thing if someone would be giving me some directions on what
to regrading acs-service-contract.

Should I post a tip for this that nicely summarizes the whole "stable objects"
discussion?

29: Re: url abstraction (response to 18)

Posted by bill kellerman on 10/02/03 05:58 AM

ok. i *am* an idiot after all. have to admit it because it's so dumb. i should have known the site wasn't losing my responses.

don't get in such a hurry that you forget to confirm the last screen and close your browser... website 1 derek 0.