Forum OpenACS Development: Charset difference with NaviServer and OpenACS 5.9:

Request notifications

Hi,

During the intallation of ]po[ 5.0.alpha1 on our 3rd customer, we've discovered an interesting issue with character set representations on a ]po[ "report" page. These "report" pages are written incrementally using ns_write, because they can take several hours to complete. These pages come with their own header. I'm really not sure about this header, except that it worked fine in OpenACS 5.7 with AOLserver. The resulting HTML code is identical with a "normal" ]po[ page, but there are differences in the HTTP headers. I've run WGET --server-response on the page with the following results. Maybe somebody has an idea what the difference could be between OpenACS 5.9/Naviserver and OpenACS 5.7/AOLserver? I've check the usual suspects including latin-1 vs. UTF-8 encoding. Please see the WGET headers output below, it contains a "normal" page ("become"), followed by a "report" page. The difference needs to be in the HTTP headers, because the same content works OK in /ds/shell.

Thanks!
Frank

---

http://po50dev.project-open.net/become?url=/intranet-reporting/timesheet-customer-project?start_date=2016-02-15
Resolving po50dev.project-open.net... 5.9.16.181
Connecting to po50dev.project-open.net|5.9.16.181|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 302 Found
Server: NaviServer/4.99.9
Date: Mon, 22 Feb 2016 13:27:49 GMT
Set-Cookie: ad_secure_token=""; Expires=Fri, 01-Jan-1980 01:00:00 GMT; Path=/; Secure
Set-Cookie: ad_user_login=""; Expires=Fri, 01-Jan-1980 01:00:00 GMT; Path=/; Secure
Set-Cookie: ad_user_login_secure=""; Expires=Fri, 01-Jan-1980 01:00:00 GMT; Path=/; Secure
Set-Cookie: ad_user_login="624%252c1456147669%252c0367B9B91%252c0%2b%257b168%2b1456176469%2bDE867037EB1888C14A7D00F081447F4C01DBD4D2%257d"; Expires=Fri, 01\
-Jan-2035 01:00:00 GMT; Path=/; HttpOnly
Set-Cookie: ad_session_id="861760%252c624%252c1%252c1456147669%2b%257b175%2b1456148869%2b18EE6725B51CC999D80319CC3DB61217A34A6F90%257d"; Expires=Fri, 01-Ja\
n-2035 01:00:00 GMT; Path=/; Discard; HttpOnly
Location: http://po50dev.project-open.net/intranet-reporting/timesheet-customer-project?start_date=2016-02-15
Content-Type: text/html; charset=utf-8
Content-Length: 377
Connection: keep-alive
Location: http://po50dev.project-open.net/intranet-reporting/timesheet-customer-project?start_date=2016-02-15 [following]
http://po50dev.project-open.net/intranet-reporting/timesheet-customer-project?start_date=2016-02-15
Connecting to po50dev.project-open.net|5.9.16.181|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 200 OK
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Server: NaviServer/4.99
Length: unspecified [text/html]
Saving to: “become?url=%2Fintranet-reporting%2Ftimesheet-customer-project?start_date=2016-02-15.1”

0K .......... .......... .......... .......... .......... 749K
50K .......... .......... .. 1.34M=0.08s

2016-02-22 14:27:49 (871 KB/s) - “become?url=%2Fintranet-reporting%2Ftimesheet-customer-project?start_date=2016-02-15.1” saved [74020]

Sorry, I forgot to say that the original character sequence was "äöü-ÄÖÜ-ß" that got displayed as ���-���-�.
Collapse
Posted by Gustaf Neumann on
Hi Frank,

maybe something is wrong with your templates and/or setup. The following is supposed to work (and works on openacs.org, (try /stream-on).

-g

set title "Some Title"
set context [list $title]
set template [parameter::get -package_id [ad_conn subsite_id] -parameter StreamingHead] 
ad_return_top_of_page [ad_parse_template -params [list context title] $template]

foreach i {1 2 3} {
    ns_write "$i...<äüö> <ÄÜÖ>\n"
    ns_sleep 2
}
ns_write "
DONE\n"

Collapse
Posted by Neophytos Demetriou on
It seems like an encoding issue while storing the data. Not sure if you have had a chance to try encoding convertfrom (or convertto) utf-8. If that does not work, the following might help (off the top of my head):

binary scan $somebindata a* somevar

along with:

binary format a* $somedata

That said, there is a nice article on charsets and unicode by Joel Spolsky: http://www.joelonsoftware.com/articles/Unicode.html

Collapse
Posted by Neophytos Demetriou on
Following up on my previous message, you might want to ensure that the page that has the form (if it's a form) that sends the data for storage is also on a page with:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

The link you provided has it (and it also has the right Content-Type in the response headers). If it's not stored from a form (e.g. some offline process inserting data), then see previous message.

Collapse
Posted by Gustaf Neumann on
actually, i think i know what the problem is:
  • po uses its own function "im_report_write_http_headers" to output headers, rather than the proper OpenACS functions that configure the streams.
  • I would not be surprised, if the problem "goes away" when you use the OpenACS API: ad_return_top_of_page [1] rather than the hand-made version.
  • if you need more control, you might consider
    ns_headers 200 text/plain
    ns_write "Results: \n"
    foreach i {1 2 3} {
        ns_write "$i...'äüö' 'ÄÜÖ'<br>\n"
        ns_sleep 2
    }
    ns_write "Done."
    
  • for details on ns_headers, see [2]
all the best
-g

[1] http://openacs.org/api-doc/proc-view?proc=ad_return_top_of_page&source_p=1
[2] http://naviserver.sourceforge.net/n/naviserver/files/ns_write.html

Collapse
Posted by Neophytos Demetriou on
Not so sure it's a headers issue as the Content-Type header is correct for the given link/page. I would first check to make sure the data stored in the db is correct.

That said, ad_return_top_of_page uses ReturnHeaders that, in turn, uses ns_startcontent, which sets the connection outputEncoding (see Ns_ConnSetEncoding in conn.c). The default connection outputEncoding is suppose to be utf-8 anyway.

FWIW, the outputEncoding is utilized in Ns_ConnWriteVChars (in connio.c) to convert the response bytes using Tcl_UtfToExternalDString (if outputEncoding is not set to utf-8) i.e. converting each byte of a utf-8 character before it is sent to the browser.

Collapse
Posted by Gustaf Neumann on
neophytos,

the function used on this page (im_report_write_http_headers) is NOT using ReturnHeaders.

-g

Collapse
Posted by Neophytos Demetriou on
Thank you Gustaf, we are in agreement. My reply was meant to put more emphasis on the function of ns_startcontent (as opposed to the response headers) as it sets the connection outputEncoding and if that is not set to utf-8 (the default for outputCharset parameter) naviserver converts the actual response (via Tcl_UtfToExternalDString) before it is sent back to the browser. Converting a utf-8 string to external, converts each byte of a utf-8 character thus leading to the awkward result. Finally, I suggest checking the data in the db before delving deeper into encodings.
Collapse
Posted by Gustaf Neumann on
ns_startcontent is deprecated since 2007 (see [1] for the discussion). -g

[1] https://sourceforge.net/p/naviserver/mailman/message/8350950/

Collapse
Posted by Frank Bergmann on
Hi!

Thanks for the quick replies first of all.

I spent a few hours producing sample code and unsuccessfully trying to understand character encoding in AOL/NaviServer. Until I found out that I simply have to replace ns_startcontent with ReturnHeaders. Sorry, this didn't become clear from your discussion.

- Maybe you want to update the documentation of ns_headers. It does't say anything about "configuring the streams", which it should. I understood it's just a variant of ns_write.
- Why did't you update ns_startcontent to behave consistently in NaviServer?

Even following the SF discussion on ns_headers I don't understand why you had to break something, even if it is "old" or "inconsistent".

> This is all handled automatically now, and ns_startcontent should not be used.

Breaking stuff means increased costs for us (and others) to update. If you break stuff, then we'll need to fall back to the policy to update only if absolutely necessary. The last time we managed to stay 7 years in that mode...

Cheers,
Frank

Collapse
Posted by Gustaf Neumann on
It would have taken me much less time digging around in po code if you would have submitted a proper minimal bug report.
I have provided you with simple functions showing how streaming pages are supposed to work, using these should be quite straightforward for an experienced developer.

The function "ns_startcontent" was deprecated in NaviServer about 9 years ago, so i am not impressed by the the 7 years your code worked for aolserver :). The function was probably at no time working in NaviServer exactly as in aolserver (my guess is that the differences might be related with the various strange charset options in aolserver which are gone in NaviServer). It would have been probably better to remove the function in NaviServer 10 years ago, so the difference would have popped up sooner.

Anyhow, the right way for OpenACS applications is to use the abstractions provided by OpenACS and not using the low-level functions. These were introduced 15 years ago not without reason.

all the best
-g

http://openacs.org/forums/message-view?message_id=27658

Collapse
Posted by Frank Bergmann on
Thanks a lot!!

Frank