Forum OpenACS Q&A: Problem with Weblogger in 5.1: ï¾ appearing in entries.

Hi folks,

I'm putting up a personal blog.  This morning I added the first entry, and after I saved it as draft status almost every period at the end of a sentence was followed by

ï¾

(hope that comes through.  It's a lowercase i with double dots over it, and the fraction 3/4)

I'm using HTMLarea to edit.

Would the locale of my database (postgres) make a difference?  I think it's SQL_ASCII.

--cro

I switched back to the standard richtext control and I can post the entry without the offending characters.  I'd like to use HTMLarea, though, because some of my other family members might be posting to the blog, and they are less technical than me.
What version of AOLserver are you running?  This sounds suspiciously similar to problems I've had when using HTMLarea as a form input.

I was experiencing the same thing, because I automatically double-space after a period.  For some reason, the two contiguous spaces produce these mysterious characters (although I was getting different characters than you).  You can experiment by putting in a bunch of spaces in a row.  Usually this will produce a number of these strange characters.

I originally reported and discussed the problem in this post: https://openacs.org/forums/message-view?message_id=127088

At the time I was using AOLserver 3.4.2, which apparently didn't have the patches to deal with various character set issues, but I have since had similar problems with AOLserver 4.

I've resorted to an extremely ugly workaround, but it does work as a brute force method.  I use a procedure to map those particular characters into empty strings.

However, I've been unable to completely filter the characters out *before* inserting the text into the database.  For some reason I have to pull the text back out, run the filter, and then re-insert it into the database.

It seems that on the first pass some of the characters get mapped into other artifact characters that are then successfully removed on the second pass.  Like I said, it's UGLY, but it's the only thing I've been able to do to get it to work.

This is my current string mapping: [string map [list \201 {}    \215 {}    \217 {}    \220 {}    \235 {}    \240 {}    \255 {}    \263 {}    \302 {}    \303 {}    \305 {}    \352 {}    \601 {}    \605 {}    \617 {}    \620 {}    \635 {}    \640 {}    \655 {}    \702 {}    \703 {}    \705 {}] $string]

I've had to add additional characters as they cropped up during editing, but at this point I haven't come across a new one in a while.

Collapse
Posted by Jeff Lu on
It seems the problem is utf-8 encoding.
Try to enable that in your config tcl. What version of aol are you using?

Anyways try adding these to config.tcl if you are on aol4.

ns_param HackContentType 1
ns_param DefaultCharset utf-8
ns_param HttpOpenCharset utf-8
ns_param OutputCharset utf-8
ns_param URLCharset utf-8

Otherwise you can go back to aol3.3ad13, that should also fix the problem c",)

--Jeff

Collapse
Posted by Jeff Lu on
Oh also, I don't think database encoding will affect it.
I used aol4 + pg 7.3.3 (SQL_ASCII).
With that setup encountered the same problems you mentioned before.
Then, I did the above stuff I posted and it solved it.

Cheers!

Jeff,

Thanks, that solved the problem!

--cro

Well, changing to UTF-8 actually masks the problem.

I have an RSS feed for this blog.  Visiting the RSS feed generates the following:

The XML page cannot be displayed
Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later.

--------------------------------------------------------------------------------

An invalid character was found in text content. Error processing resource 'http://ncbt.org/family/abbyblog/rss/rss.xml';. Line 20, Position 25

Now this is actually an IE error.  I can visit the page in Mozilla, but Mozilla shows many of the spaces in the blog as a question mark inside a triangle.  If I look at the rss.xml file directly, a lot of spaces have been replaced by hex character A0, which on Windows is a space of some kind.  These characters exist in the database as well.

So this seems to be coming directly from the form post.  Is something odd going on with IE?  Should the content type be hacked to iso-8859-1 instead?