Forum OpenACS Development: ad_quotehtml speed improvement

Collapse
Posted by Daniël Mantione on
Hi,

While investigating how the ACS templating did work internally I was watching to a piece of compiled code and noticed how often ad_quotehtml was called. I was asking myself how much it would impact the speed of the system and had a look at the source code.

A short search turned up the string map command, and I tried this implementation:

ad_proc -public ad_quotehtml { arg } {

    Quotes ampersands, double-quotes, and angle brackets in $arg.
    Analogous to ns_quotehtml except that it quotes double-quotes (which
    ns_quotehtml does not).

} {
    return [string map {& &amp; \" &quot; < &lt; > &gt;} $arg]
}

I did a short benchmark, and this one turns out to be 7 times faster than the standard ad_quotehtml. It's also much smaller, so the compiler can perhaps inline it to achieve even more speed gains.

Collapse
Posted by Jade Rubick on
Great, Daniel!

Would you mind posting this as a patch in bug-tracker?

Collapse
Posted by Don Baccus on
Hey, that's cool ... string map is one of those things that was added to Tcl long after a bunch of our utilities were written, there might be other places where its use might speed things up.

I've made the change to CVS HEAD so you needn't submit this as a patch in the bugtracker ...

Thanks a lot!

Collapse
Posted by Andrew Piskorski on
Hm, using string map does sound like a good idea, but you know, I just looked at the source code in AOLserver 3.3+ad13. ns_quotehtml is ultimiately implemented by Ns_QuoteHtml, and it definitely does quote double quotes. I suspect the ad_quotehtml docs above are very old - they probably date back to AOLserver 2.3.x, at which time presumably they were accurate.

In fact, Ns_QuoteHtml translates these 5 different characters to their equivalent HTML entitites:

<  >  &  \  "
each become, respectively:
&lt;  &gt;  &amp;  &#39;  &#34;

Note that ns_quotehtml is translating literal backslashes, while ad_quotehtml is not. Why or why not?

So while you're at it, it would also useful to compare the string map vs. ns_quotehtml performance. If they're similar I think string map should be preferred, as it's much more general and powerful, and avoids and unnecessary dependency on the Ns_QuoteHtml C code for anyone trying to use OpenACS Tcl procs outside of AOLserver.

Collapse
Posted by Dirk Gomez on
Shouldn't we rather try to reduce the overall amount of code and deprecate ad_quotehtml and recommend ns_quotehtml instead - if both do the same anyway?
Collapse
Posted by Daniël Mantione on
The discussion made me wonder why quotes need to be quoted...

The problem is, as far as I can see, the only reason why you would want to quote quotes, in within html-tags and especially links. But, at least within links, that is not correct at all... Why?

First example:

<A href='page?a=@a@&b=@b@'>

As you see, I'm using single quotes. This is perfectly legal, and often usefull because single quotes do not have any special meaning in Tcl. For example:

<%="
This is a text where some values like this one $var1 will be inserted automatically [funca]
<A href='page?a=${var2}&b=[funcb]'>blablbalbablablabl</A>
"%>

Because of this property, I've learned myself to write html using single quotes and so I can code adp like above in all my vanilla AOLserver code.

So, do we want to quote ' too? It would solve the problem.

Second example:

<%
# For simplicity of example, would normally be done in .tcl file
set name "Daniël Mantione"
%>
<A href='page?name=@name@'>blablablablablabla</A>

Whoops! Afaik, an URL should in all situation consist of standard US-Ascii. It is no solution to quote, &euml; won't work in a URL.

What should be done instead? This is, of course, ns_urlencode. This is true in any situation where a variable is being expanded within a tag attribute that specifies a URL. (I.e. A href, IMG src, LINK href etc...)

Unless there is another reason to quote ", the entire reason why it should be quoted is bogus...

Collapse
Posted by Andrew Piskorski on
Oops! My description above of Ns_QuoteHtml is incorrect! It does not quote backslash, it quotes single-quote. I mis-read the case '\'': in the C code last time around.

The Ns_QuoteHtml function appears to be identical in AOLserver 3.3+ad13 and the current AOLserver 4.0 CVS head.

Collapse
Posted by Dave Bauer on
Query variables need to be urlencoded. The OpenACS way of doing this is something like this:

set url "page?[export vars name]"

in the tcl file.

Then in your adp you would have:

a href="@url;noquote@"

Collapse
Posted by Nis Jørgensen on
Wouldn't that be

<a href="@url@">

rather than

<a href="@url;noquote@">

?

As far as I can see we still need to encode the &'s.

Collapse
Posted by Daniël Mantione on
Ehm.... It is not very hard to find source code in ACS that expands all query-variables in the adp.
Collapse
Posted by Dave Bauer on
Yes, that is correct. Almost every URL in an adp file does this incorrectly.

Much of the code style has been passed down from ancient ACS versions. It is something that should be fixed.

I will also try to get a chance to look at the developer documentation and update it to reflect this.

Collapse
Posted by Daniël Mantione on
Ok! What is the recommended way to code if the URL is dependend on <multiple> data?
Collapse
Posted by Tom Jackson on

I think what you would do, is on the .tcl page:


set url my_url?[ad_export_url_vars var1 var2]

Then use the ;noquote because the result might contain actual '&' characters, for example. If you quote those, the url might not work.

Collapse
Posted by Nis Jørgensen on
I assume you mean
  export_url_vars
rather than
  ad_export_url_vars

(unless ad_export_url_vars was added in 5.0)

If you do, you are still missing the point: The '&' characters in the url's MUST be quoted for the html to be valid. Since most browsers don't care, the problem is not big though.

Here is an example that might cause problems:

<a href="calculate-resistance?volt=2&amp=3">test</a>

Collapse
Posted by Tom Jackson on

That an interestingly confusing example.

Collapse
Posted by Tom Jackson on

Also, recently, the AOLserver parser had a bug resolved. Tag attributes which were quoted with single quotes were not correctly interpreted. Definitely, this is also a bug in ns_quotehtml, it should quote the single quote. Until this bug is fixed, the string map sounds like a good choice.

Collapse
Posted by Daniël Mantione on
ns_quotehtml does quote the single quote. ad_quotehtml does not.
Collapse
Posted by Don Baccus on
Is that ns_quotehtml in AOLserver 4.0 or 3.3 or both?  We're still supporting AOLserver 3.3 so can't switch to ns_quotehtml unless we know it is correct in all the versions we support.

It sounds like we may want to deprecate ad_quotehtml in OACS 5.1 while rewriting it to call ns_quotehtml (and rewriting calls to ad_quotehtml as folks think of it).

Sound right?

Also instead of ad_export_url_vars we should use the export_vars proc as someone else recommended above.  It works for both URL and FORM vars (URL by default), allows assigning of values in the call, etc.

Collapse
Posted by Daniël Mantione on
I don't know about AOLserver 3.3, but the 3.5.6 sources do expand quotes.
Collapse
Posted by Don Baccus on
No, NOT export_url_vars, we've been trying to push folks to use export_vars now for a couple of years.

Now if someone wants to submit a patch to force export_vars to quote "&" we can take a look at it, but in practice I can't imagine any browser on the planet enforcing the standard in this regard ...

Collapse
Posted by Nis Jørgensen on
Don wrote:
Now if someone wants to submit a patch to force export_vars to quote "&" we can take a look at it, but in practice I can't imagine any browser on the planet enforcing the standard in this regard ...

On the contrary - ALL my installed browsers (Opera, Mozilla, Internet Explorer) have problems with the unquoted URL I gave:

<a href="calculate-resistance?volt=2&amp=3">

But since export_vars has a -quotehtml AND the tempalting system now quotes by default, I don't really see the need for the patch you describe ...

Collapse
Posted by Tom Jackson on

But the & in your url isn't html, it is just and old style of joining vars. Shouldn't the urls be fixed by using the new style, instead of quoting?

Collapse
Posted by Nis Jørgensen on
I have no idea what you are talking about with "old style" and "new style". The url I gave is the result of doing what Dave Bauer suggested:

In tcl file

  set volt 2
  set amp 3
  set url "calculate-resistance?[export_vars [list volt amp]]"

and in the adp:

  <a href="@url;noquote@">

My suggestion was to replace the last line with just

  <a href="@url@">

Please tell me how this should be "fixed using the new style".

Collapse
Posted by Tom Jackson on

Well, in general it looks like rfc 2396 covers the issue. From section 2.4.2. "When to Escape and Unescape":

   A URI is always in an "escaped" form, since escaping or unescaping a
   completed URI might change its semantics.  Normally, the only time
   escape encodings can safely be made is when the URI is being created
   from its component parts; each component may have its own set of
   characters that are reserved, so only the mechanism responsible for
   generating or interpreting that component can determine whether or

   not escaping a character will change its semantics. Likewise, a URI
   must be separated into its components before the escaped characters
   within those components can be safely decoded.

"HTML Quoting" and "url escaping" are two different things. You can't quote components of a url, and definitely not the entire url. The & is a reserved separator, and I think the new one is the ';' semi-colon. Probably the old one, when used in an HTML page needs to be quoted?

Collapse
Posted by Daniël Mantione on
What the heck?

If would use <A href='@url@'>blablabla</A> you would get this url:

<A href='calculate-resistance?volt=2&amp;amp=2<blablabla>/A>
This is of course plain nonsense...

However, if you use the correct <A href='@url;noquote@'>blablabla</A> you get:

<A href='calculate-resistance?volt=2&amp=2'>blablabla</A>
... which is the correct way to pass the variables volt and amp to the url.

Quoting is allways wrong within links. URL-encoding is ok. I.e. if you want to pass a variable containing &, you would URL-encode it.

Collapse
Posted by Nis Jørgensen on
Daniël:  Did you test your own examples? In all my browsers, the first one works and the second one doesn't.

Tom: The relevant document to quote is the SGML standard (for HTML) or the XML standard (for XHTML). Luckily the XHTML standard has a nice little writeup of the issue:

http://www.w3.org/TR/xhtml1/#C_12

(notice the example given)

Collapse
Posted by Daniël Mantione on
Nils: Seems you are right. Except Lynx & Links, my browsers did not accept the first example.

In that case, for performance reasons, it would be best if export_vars fixed the url, instead of quoting it afterwards.

Collapse
Posted by Lars Pind on
No!

The idea is this:

A URL is a URL is a URL. You generate the URL using export_vars, and get it in a Tcl string.

HTML is HTML is HTML. So if you want to send a string over HTML, you need to HTML-escape/quote certain characters, including the ampersand (&) character.

If you were to transmit your URL over plaintext email, or on a text/plain page, such as in a cvs, then you don't want to HTML-escape it.

That's why export_vars shouldn't do HTML quoting, and the templating system (design for HTML output) should.

/Lars

Collapse
Posted by Tom Jackson on

So export_vars is the new thing to use, but what happens when you want the form vars variety and one of the vars is return_url? Obviously you need to use ;noquote, but, ummm, you need to quote the value of return_url since it might contain &. This is why I was suggesting to use ';' instead of the & to separate url vars. Doesn't anyone think it is strange that the uri and html were designed by essentially the same guy, and this is actually an issue? Yet somehow the web developed even with this major flaw. The main result of the flaw appears to be that some guy somewhere had to rearrange his url to read

<A href='calculate-resistance?amp=2&volt=2'>blablabla</A>