I'm working on how Greenpeace Planet sends locale-specific
character
encodings, and was hoping for some feedback on one of the
implementation details. Eventually, acs-lang will need to address
the same problem, and I thought a shared solution would be better.
For the impatient: I need to make sure that every call to ns_return
or the like (ns_returnnotice) supplies the correct mimetype/character
encoding for a given locale, and I'm not sure what the best place to
make the change is.
The gory details:
The basic problem is this: If your site only features a single
character set, such as iso-8859-1 for Western European languages, it
is easy enough to configure AOLserver to automatically return the
right character encoding. If your site features a mix of languages
that use different character sets, acs-lang / gp-lang records a
character encoding for each locale and can tell you which one to
use. Now, you want want to send the right character set to the
client. What does "sending" a given character encoding to the client
mean? You want
- the http headers that are sent to correctly specify
the character encoding along with the mime type in the "Content-Type"
line,
- ditto for the mime type / character encoding specified in the meta
tag in the portion of the html and
- the bytes that are sent to the browser need to be correctly
encoded.
Assuming you've written a procedure that will return the
locale-specific charset, number two is easy - you just need your
templates to call the procedure when they write the meta tag. One and
three are a little more complicated. Fortunately, Rob Mayoff wrote a document that
explains the messy details. Unless you want to use ns_write to specify
the headers yourself, what you need to do is specify the character
encoding explicitly when you call ns_return or one of its cousins like
ns_returnnotice. Thus
ns_return 200 "text/html; charset=shift-js" "bla bla"
will include the character set in the header and tell
AOLserver how to encode the data.
It looked like there was another option: you can access the output
headers as an ns_set through [ns_conn outputheaders] at any point in
the thread before you return something to the browser. The problem is
that ns_return appends a mimetype to the output headers... so
if you try to stuff in the mime type ("Content-Type") beforehand,
you'll wind up with a second "Content-Type" line created by ns_return.
So what's the best way to include the character set? The easy
solution seems to be to just include a modified version of doc_return
in the gp-lang package - doc_return seems to be what the templating
system calls when it returns a normal page. Then, of course, you have
to make sure non-templated pages call doc_return (regretably an issue
with Planet). At any rate, this seems like a partial solution at best,
since the ACS core returns lots of error pages and so on that don't
call doc_return (the error pages are hardcoded in english of
course). The only other thing that comes to mind is trying to hack
ns_return its cousins. Any thoughts?