Forum OpenACS Development: Has anyone upgraded an ecommerce site to AOLserver4?

I have recently upgraded a couple of ecommerce sites to AOLserver 4 and am having some charset problems.

When we first switched, certain "special" characters like the pound sign (as in British pounds) and the ellipsis (...) and double-dash from Word were displaying with a capital A with a tilde over it in front of them.  I was able to fix some of them by adding

ns_param  OutputCharset      utf-8

to the ns/parameters section of my config file.

However, the ones that are part of ecommerce product descriptions are still messed up.  In IE for Mac, the A is now a solid black box (very attractive :).  In Firefox it is inconsistent - one one of my systems it looks fine now, and the other has an A with a black border around it where IE puts the solid box.  And in Safari it appears to be correct at all times.

Presumably this is all happening due to the way the pages are generated - the product template is run through ns_adp_parse, and then the result is templated by acs-templating.  However, this used to work with the aD patches to nsd 3.3, so something seems to be missing or misconfigured now.

Has anyone figured out a solution to this?  I'm working with Dossy on it but any ideas are welcome at this point.

Collapse
Posted by Dave Bauer on
Janine,

Check that you have all of these in ns/parameters

# Unicode by default:
# see http://dqd.com/~mayoff/encoding-doc.html
ns_param  HackContentType    1
ns_param  DefaultCharset    utf-8
ns_param  HttpOpenCharset    utf-8
ns_param  OutputCharset      utf-8
ns_param  URLCharset        utf-8

That is from the default config file from OpenACS 5.1.1

Collapse
Posted by Janine Ohmer on
Thanks, Dave - I just tried that but it didn't help. :(

Dossy says that it's probably caused by pasting Word content into a web form (I'm contacting the client to find out how they are creating their content) and that if I fix the content in the database it should work.  But it worked in the other version of AOLserver, and I'd much rather make some global change to nsd than have to go fix several sites' worth of content and make the client change their workflow.

So the search continues...

Collapse
Posted by C. R. Oldham on
Janine,

We also see this with HTMLarea.  Not sure if that helps.

--cro

I've seen similar symptoms on the product pages when working with aolserver3.x.  Sometimes caused by funny characters added to the product template, sometimes from data.

For data, our current fix is to:

1. Use a series of regsubs on the data to convert common characters to html entities etc.

2. manually "zap gremlins" (using bbedit) before importing data to ec.

It would be nice to have a filter/converter added to the bulk import and product-edit, product-add pages, but I'm not sure yet of the best approach to handle the various charsets.

Maybe something like this already exists in the forums package?

Collapse
Posted by Steve Manning on
Janine

I've just replied to you on the AOLS list and then just found the same reply here 😊

I started on AOLS4 and then switched back to AOLS3 (becuase of the SSL problems). I have the default UTF encoding as per the config.tcl of v5.1 onwards and that seems to have fixed the problem which we had pasting into the HTMLArea.

However, I don't think our client was pasting from Word documents but I'll check this and let you know.

    - Steve

Collapse
Posted by Andrew Piskorski on
Janine, sounds like you (or somebody...) is going to have to get down and dirty digging through the AOLserver 4.0.x character set implementation. I'd thought that stuff was forward ported more or less verbatim from Rob Mayoff's 3.3+ad13 character encoding work, but it looks like some sort of incompatibility or difference in default settings crept in.

Who did the 4.0.x character set work, or who else really understands its guts, either the 4.0.x or the 3.3+ad13 version? Most rapid fix might be to get those folks involved, if possible...

Collapse
Posted by Andrew Piskorski on
Here's Janine's AOLserver list thread about this problem.
Collapse
Posted by Janine Ohmer on
I seem to have solved this, thanks to Dossy.  The key was to set the OutputCharset, which goes in the ns/parameters section, to iso-8859-1.  I thought I had tried this already, but I had used iso8859-1 (note the missing dash), which if I understand this correctly is valid for an encoding but not for a charset.  Oy, talk about potential for confusion!

Thanks to all who replied!

You should notice that you are _still_ sending out invalid iso-8859-1 characters (the cp1252 additions). The only change is that you are now sending them as iso-8859-1, so most clients will show them correctly using the cp1252 charset. I would recommend cleaning up the database if possible. And making sure it doesn't get in again. And using unicode. Each of these will decrease the risk that you are bitten again.