Forum OpenACS Development: Re: New Feature: Formbuilder maxlength

Collapse
Posted by Michael Hinds on
Lars,

I'm not sure why you don't want to use string length. Here's what the manual says about bytelength

string bytelength string Returns a decimal string giving the number of bytes used to represent string in memory. Because UTF-8 uses one to three bytes to represent Unicode char¡ acters, the byte length will not be the same as the character length in general. The cases where a script cares about the byte length are rare. In almost all cases, you should use the string length operation. Refer to the Tcl_NumUtfChars manual entry for more details on the UTF-8 representation.

So it seems to me string length works fine. Have you seen evidence otherwise?

Collapse
Posted by Tilmann Singer on
Type psql -l to find out the encoding of your pg databases:
tils@tp:~$ psql -l
        List of databases
   Name    |  Owner   | Encoding
-----------+----------+----------
 beta      | tils     | UNICODE
 lari      | tils     | UNICODE
 lari2     | tils     | UNICODE
...
If you have something else, for example SQL_ASCII, in there then those are single byte encoded databases. As far as I understand it's in almost any case the right thing to create your database as UNICODE when you want to be able to store data in different encodings.

The error that your maxlength procedure catches indicates that something else is going wrong before, because in that case you would end up storing a single international character (e.g. a german umlaut) as two characters in the db, which leads to lots of other problems. For example a query that selects a substring could split the 2-byte character in two pieces. You should have created your database UNICODE encoded or in the encoding that understands the characters that you need (e.g. LATIN1).