Forum OpenACS Development: ns_xml namespaces and encoding support

Collapse
Posted by Guan Yang on

I am working on some changes to the news-aggregator package, and we are having some problems with RSS feeds where encoding="iso-8859-1" is specified in the processing instruction, and which actually contains ISO-8859-1 specific characters, like this: Det strømmer ind med henvendelser i forbindelse med efterforskningen af drabet pÃ¥ den 12-Ã¥rige Mia Teglgaard Sprotte. The ø is supposed to be ø.

To me this looks like a conversion back and forth from UTF-8. Before I start reading through all of the news-aggregator package to see if we are using UTF-8 unsafe functions, I would like to ask people whether there are known problems with non-UTF encodings in ns_xml. Is this the case?

Another problem is that I cannot find anything in the docs on how to use namespaces in ns_xml. How do I determine the namespace of an element in ns_xml?

Collapse
Posted by Dave Bauer on
You probably want to look into tDOM. It seems to have much more complete XML support. As of OpenACS 5.0 ns_xml will not be required for OpenACS and tDOM will be required.
Collapse
Posted by Simon Carstensen on
Looks like tDOM will be the solution to our problem:

"[The parsefile command] reads the XML data directly out of the file with the filename filename and parses that data. This is done with low level file operations. The XML data must be in ISO-8859-1, UTF-8 or UTF-16 encoding. If applicable, this is the fastest way, to parse XML data."

And it looks like it supports namespaces as well. Perfect. Thanks!

Collapse
Posted by Håkan Ståby on
I couldn't wait, so I fixed tDOM support. :) I had to modify the postgres tables to use timestamptz instead of timestamp to get it to work. (Maybe there is another way?)

Where do I send the patch if anyone is interested? I guess it would be useful if someone (Simon?) reviewed the code before adding it, since I only have little experience coding for openACS. Maybe there need to be upgrade scripts added as well?

I have tested it with postgresql 7.3.2 and to parse two sites that use ISO-8859-1 characters.

Cheers,
Håkan