Forum OpenACS Development: XoWiki URL Cleanup

Collapse
Posted by Carl Robert Blesius on
I want links in xowiki to look like this:
http://our.internal.site/wiki/Acute_Coronary_Syndromes
https://openacs.org/xowiki/.LRN_Installation

and not like this:
http://our.internal.site/wiki/pages/en/Acute%20Coronary%20Syndromes
https://openacs.org/xowiki/pages/en/%2eLRN%20Installation

What I propose:

1. For spaces to be represented by "_" (like in Wikipedia's Mediawiki) and not "+" (like in xowiki)
2. For all characters that are designated "special" by RFC 1738 to go through unencoded
3. For language to be determined using content negotiation rather than distinct branches per language (see: http://www.w3.org/QA/2006/02/content_negotiation.html)

Any objections/suggestions before we start on this?

Here is info from RFC 1738 for reference:

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

Reserved:

Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme.

Collapse
2: Re: XoWiki URL Cleanup (response to 1)
Posted by Gustaf Neumann on
Carl,

a significant part of the problem is ad_urlencode, which does the encoding. The same applies for all urls, which are generated by openacs, so the correct fix for removing the spurious encoding is ad_urlencode, or even better ns_urlencode. btw, since ns_urlencode already encodes "_" as %5f (unneeded according to rfc 3986, see unreserved characters), ad_urlencode contains pre/postprocessing hacks. oacs uses a mix of ns_urlencodes and ad_urlencodes already (ns_urlencodes is used about 3 times more than ad_urlencode). I mailed about the unneeded encodings in ns_urlencode to the aolserver mailing list in dec 2005, but got no reply. in contrary to aolserver, naviserver appears to have a rfc 3986 compliant url encoder/decoder, using a different interface.

concerning 1: urlencoding uses + for spaces. this deviates from the standard. by doing so naively, one looses the ability to distinguish between "a b" and "a_b". Is there a specification, what mediawiki does in detail?

concerning 2: notice that RFC 1738 was replaced by RFC 3986, which says: for "unreserved characters", defined as

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

no percent-encoded octets should be created. URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource.

concerning 3: what is the connection between content negotiation and language selection in xowiki? what is your exact proposal?

i am open to discuss every change proposal, but strictly against hacks.

Collapse
3: Re: Re: XoWiki URL Cleanup (response to 2)
Posted by Malte Sussdorff on
How about doing a search and replace where we replace ns_urlencode with ad_urlencode and make ad_urlencode RFC 3986 compliant?
Collapse
4: Re: XoWiki URL Cleanup (response to 1)
Posted by Dave Bauer on
oacs-dav WebDAV support pakcage does this:

ad_proc oacs_dav::urlencode { string } {
urlencode allowing characters according to rfc 1738
http://www.w3.org/Addressing/rfc1738.txt

"Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL."

ignore + used to encode spaces in query strings

This is mainly to support MS Web Folders which do not follow the
spec which states that any character may be urlencoded. Web Folders
rejects the entire collection as invalid if a filename contains
one of these characters encoded.

} {
set encoded_string [ns_urlencode $string]
set encoded_string [string map -nocase \
{+ %20 %2d - %5f _ %24 $ %2e . %21 ! %28 ( %29 ) %27 ' %2c ,} $encoded_string]

return $encoded_string
}

its not pretty but it works. Perhaps we should add this to acs-tcl as ad_urlencode until AOLserver is fixed.

I am pretty sure I discussed this on the AOLserver list or in email with AOLserver folks back when I did the tDAV work and never got any resolution. ns_urlencode has been overly agressive forever.

Collapse
5: Re: XoWiki URL Cleanup (response to 1)
Posted by Dave Bauer on
For this 1. For spaces to be represented by "_" (like in Wikipedia's Mediawiki) and not "+" (like in xowiki)

perhaps we can add a paraeter or someting, to automatically change spaces in the name to underscore. I think that is the best solution if we want to disallow spaces in the url name of the pages.

Collapse
8: Re: Re: XoWiki URL Cleanup (response to 5)
Posted by Gustaf Neumann on
there should be certainly a warning, when a user has already some entry "a_b" and adds "a b". i added this behavior to xowiki in cvs head. This translation feature is turned on per xowiki folder, when the following line is added to the folder object:

set subst_blank_in_name 1

after some testing, we could make it the default behavior, if there is interest and not too many negative gotchas for this.

Collapse
Posted by Carl Robert Blesius on
Thanks Gustaf. We updated xowiki today and my first test was not successiful, but I will try again during daylight hours.

Here is what I added to the folder object:

#######
# this is the payload of the folder object

set subst_blank_in_name 1
set index_page "en:DOM"
#######

I will double check if we got the change and report if it is not a local issue.

Here are some relevant links from the Wikipedia for reference:

http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(technical_restrictions)#Plus

http://en.wikipedia.org/wiki/Help:Page_name#Ignored_spaces.2Funderscores

http://en.wikipedia.org/wiki/Wikipedia:Canonicalization

Collapse
Posted by Carl Robert Blesius on
After the switch the bracketed links are rendered to point to + separated names, even after renaming the files.

Here is what I did:

1. Made the change
2. Created a new page named "new page"
3. Linked to that page using [[new page]]

Here is what happened:

1. new page was created as "new_page"
2. link points to "new+page"

Collapse
Posted by Gustaf Neumann on
1 and 2 is as intended, the intention was to refer to a page with [[new_page]], since 2 creates the page with its canonical name "new_page". the canonical name is used for the url as well, your hated "+" signs are gone.

however, i agree, that with the switch turned on, it makes sense to to allow both notations [[new page]] and [[new_page]]. Get it from CVS head.

Collapse
6: Re: XoWiki URL Cleanup (response to 1)
Posted by Andrew Piskorski on
Fixing ad_urlencode and/or ns_urlencode to do the right thing should be straightforward, and not that hard.

Carl's item 3 above, "language to be determined using content negotiation", is the only part whose design and implementation in OpenACS seems unclear so far, and probably needs discussion.

(I haven't actually worked with XoWiki yet, so the below may be ignorant of any related features it already yes...)

This language negotion thing sounds like a good approach, if it works. The major caveat I'd make is that regardless of whether language negotion succeeds or fails, it should always be possible for the user to see the other languages/versions of that document available, explicitly select an alternate version, send others a link to either the specific document/language URL or the generic URL, etc.

In other words, the language negotion should give the most reasonable default, but the user should always have the option of easily taking over explicit control, by simply clicking on the appropriate links. If a user uses the generic URL, and gets the German version via language negotion, he should always be able, via some website UI, to see that there are also English and Spanish versions available, and explicitly access those specific versions if he so chooses.

Note, this is entirely separate from telling the website, "No, I don't want any German docs from you, please change my site-wide preference to English." The user should also be able to do that, but that should be mostly orthogonal to the per-item access I'm talking about above.

After all, a user may prefer German as his default, but want to read the original English version of a specific document, email the URL of the Spanish version to a colleage saying, "Heh, I think the translator mis-translated paragraph 8 of our paper, but my Spanish is rusty, what do you think?", etc.

Collapse
Posted by Carl Robert Blesius on
Andrew,

Yes clear display of available languages is essential (it is mentioned in the comments of the document I linked to).

When I was working with Lars on i18n we planned to tackle content negotiation after doing all the UI work, but never got to it b/c we ran out of time (and budget). It was not tragic really b/c having content in multiple languages was the exception rather than the rule.

In any case, I would like to see a solution that is not limited to xowiki and the prospect of having thousands urls with unnecessary /lang/pages/ in them on our local site is motivating.

I will try to find out what others have done (e.g. someone recently mentioned that the new version of plone does a good job in this area) and will report back

I do not want this to be hack either Gustaf.

Carl

Collapse
7: Re: XoWiki URL Cleanup (response to 1)
Posted by Dave Bauer on
Andrew, that reminds me.

If my preferred version (by whatever method) is Germans, or English, or Spanish, should it show me the correct version, with the "official" url for that language, so if I share it, it shows the url of the language I am looking at, or a "generic" url that automatically shows me some version.

This is tricky, b/c then if I explictly choose another language, i'll get the language specific URL, but it makes it hard to copy/paste the language specific URL for my preferred language.

So there are two options 1) generic URL that auto selects, and 2) redirect to the correct language specific URL. I have no idea which is better or why, but I think there an be a case made for both.

Perhaps someone who has done someting with multi-language content can comment, I know greenpeace.org has multiple language content, but I don't know how they do it.

Collapse
11: Re: XoWiki URL Cleanup (response to 1)
Posted by Gustaf Neumann on
concerning subst_blank_in_name: i have activated this for testing purposes on openacs.org, it seems to work fine (created a testing page). note, that it does not alter all page names on the fly, one has to edit a page such that the page name is altered. note, that this change does not alter the only the url, but as well the internal links (intra-wiki-links) and the page names.

using explicit language specifications. As others have pointed out there is a substantial difference between the navigational elements of the framework (labels on forms and buttons, error messages) and about content. while there is no doubt that for the navigational elements, content negotiation is very useful, i have doubts about the content language. when i have configured German as interaction language, but i would be completely unhappy to get for a link .../xowiki only the German version, especially when i am in an English page. the chosen language depends
- on the intentions of the page supplier,
- on the context where the page is used and
- on preferences of the user, and maybe
- on the availability on the system.
I see no way around having canonical URLs pointing to a page in a specified language, existing or not. i got as well the impression that you are unaware of the fact, that xowiki does language guessing already on many places, including urls.

Carl, what do you mean by the following sentence:

Any objections/suggestions before we start on this?

I do not want this to be hack either Gustaf.
i did not want to imply this: i am sorry, if it was possible to interpret my posting this way. i don't have the intention to make xowiki a clone of wikipedia (why not using wikipedia in the first place), but to develop xowiki into a more general tool in the context of a community system like openacs. There are several decisions made in wikipedia which i do not like (although they make make sense in wikipedia).

i see many places (maybe to many) where xowiki can be and should improved, also the url space (and the rp) belongs to this. i would certainly prefer a more object oriented way to tackle these problems. my point was that i see this discussion not in a state of: "here is what we want and we need somebody to implement it" but in a state where a deeper understanding of what's needed in what situation, and how does this fit with the current design is desirable.