Forum OpenACS Development: Tclwebtest fails with query vars and HTML anchors

If you have a url like
assessment/asm-admin/questions?assessment_id=123&#345

with a # to jump to an anchor in the HTML.

Tclwebtest parses the &#345 are a unicode XML entity. This of course breaks the redirect to the correct url.

The reason the & was added is that sometimes aolserver will get an invalid integer as it tries to parse the entire 123#345 as the query variable's value.

I tried replacing the & with a ; but that did not help.

I also tried with &#345 but that has the same result as &#345.

Does anyone have any ideas? It makes it hard to test assessment with this behavior. I suppose I can poke around in the tclwebtest code where it handles the redirect from ad_returnredirect. If ad_returnredirect is not sending HTML (does it?) then the url should not realy be parsed as xml but plain text.

Collapse
Posted by Dave Bauer on
Ok I looked in the code its pretty clear what happens


# is it a redirect ?
if { $http_status == "302" || $http_status == "301" } {
set avoid_tidy_p 1
for { set i 0 } { $i < [llength $meta] } { incr i 2 } {
if { [string match -nocase [lindex $meta $i] "location"] } {
set location [translate_entities [string trim [lindex $meta [expr {$i+1}]]]]
break
}

There is a call to translate entities which is what is catching the &#{integer}

I wonder if it works fine if you have very high object ids but on a new install the object ids are under 1000 so are valid unicode codes.

Collapse
Posted by Dave Bauer on
Oops see what I mean

I also tried with &#345 but that has the same result as ř.

was supposed to say

I also tried with &amp;amp;#345 but that has the same result as &amp;345

Collapse
Posted by Gustaf Neumann on
i wonder, how essential entity replacement (and in particular numeric entity replacement) in URLs is, since URLs should be url-encoded, so there should be no ambiguity in the interpretation of the string. See e.g. the statement from W3C for the coding of UTF-8 in URLs: http://www.w3.org/International/O-URL-code.html

Actually, I wonder why tclwebtest is doing entity replacements in URLs at all.

Collapse
Posted by Dave Bauer on
Gustaf,

In HTML the ampersand is supposed to be encoded as the &amp;amp; html entity. Since tclwebtest actually parses the generated html, it needs to translate that before making a request for the url.

This is my understanding anyway. Of course, OpenACS breaks this rule quite often, but I think we are getting better by validating the generated HTML of the packages for the Zen project.

Collapse
Posted by Gustaf Neumann on
dave,

i know what entity encoding in HTML and XML is for. however, in the code piece you posted, it is handling a redirect, getting the location, and does the decode on provided value:

set location [translate_entities [string trim [lindex $meta [expr {$i+1}]]]]

The point is: location is an url and not supposed to be entity encoded, but url encoded. so i am still wondering, why it is doing it...

Collapse
Posted by Dave Bauer on
I guess it depends on how a HTTP 302 redirect works.

Does some HTML get sent to the browser with the URL? or is the URL an HTTP header?

Anyway I agree that there is translating of entities going on that is probably not necessary, but seems like its there more to avoid any potential problems. Unfortunately it can cause a different problem.

Collapse
Posted by Gustaf Neumann on
rfc 2616 defines HTTP and therefore how redirects work; and it defines what the content of a location should be (see section 14.30). URLs must be url encoded, no matter whether or not they contain HTML. So, when a location is decoded, it must be url-decoded.

yes, of course the location is a header field, your code snipplet is used for decoding the header fileds, in particular the "location: ..." of a redirect.

HTML entity-decoding looks still like a bug to me.