Forum OpenACS Development: New Feature: Formbuilder maxlength

Posted by Lars Pind on 03/20/03 01:46 PM

I've added a new feature to form builder elements: "maxlength".

Use like this:

element create myform term_name -label "Term name" -datatype text -maxlength 20

What it'll do is add a maxlength="20" attribute to the input widget.

But moreover, it'll also validate the value on the server side, with a call to [string bytelength], which correctly handles multibyte characters.

The error message to the user will be "Term name is 3 characters too long". The reason we don't explicitly say what the maxlength is, is that it depends on the presence of multibyte characters. So telling him to remove 3 will always work. If he removes multibyte characters, he could get away with removing fewer, but if he removes 3, he's guaranteed to be safe.

I've also added it to ad_form:

{term_name:text {label "Term name"} {maxlength 20}}

Please use liberally on all your forms, so we can avoid those nasty DB errors causing 500 internal server errors, just because the user typed a few characters too much.

/Lars

2: Re: New Feature: Formbuilder maxlength (response to 1)

Posted by Jeff Davis on 03/20/03 02:05 PM

I am not sure bytelength is the right thing to use. I guess in general it will be conservative but if your db is utf-8 and you have a varchar(20), isn't is 20 characters even if they are multibyte? Conversely if your db is iso-8859-1 and you enter high bit characters bytelength will say 2 bytes but the representation in the DB will be one byte.

3: Re: New Feature: Formbuilder maxlength (response to 1)

Posted by Lars Pind on 03/20/03 02:12 PM

Hm. I did this with my PG installation, and bytelength was what corresponded to the PG interpretation of my varchar(20).

Don't know about the character set of my PG installation. How do I find out?

What would the right way to check for maxlength before the page blows up be?

/Lars

4: Re: New Feature: Formbuilder maxlength (response to 1)

Posted by Michael Hinds on 03/20/03 05:42 PM

Lars,

I'm not sure why you don't want to use string length. Here's what the manual says about bytelength

string bytelength string

       Returns a decimal string giving the number of bytes
       used to represent string in memory.  Because  UTF-8
       uses  one to three bytes to represent Unicode char¡
       acters, the byte length will not be the same as the
       character  length  in  general.   The cases where a
       script cares about the byte length  are  rare.   In
       almost  all cases, you should use the string length
       operation.  Refer  to  the  Tcl_NumUtfChars  manual
       entry for more details on the UTF-8 representation.

So it seems to me string length works fine. Have you seen evidence otherwise?

5: Re: New Feature: Formbuilder maxlength (response to 4)

Posted by Tilmann Singer on 03/21/03 08:20 AM

Type psql -l to find out the encoding of your pg databases:

tils@tp:~$ psql -l
        List of databases
   Name    |  Owner   | Encoding
-----------+----------+----------
 beta      | tils     | UNICODE
 lari      | tils     | UNICODE
 lari2     | tils     | UNICODE
...

If you have something else, for example SQL_ASCII, in there then those are single byte encoded databases. As far as I understand it's in almost any case the right thing to create your database as UNICODE when you want to be able to store data in different encodings.

The error that your maxlength procedure catches indicates that something else is going wrong before, because in that case you would end up storing a single international character (e.g. a german umlaut) as two characters in the db, which leads to lots of other problems. For example a query that selects a substring could split the 2-byte character in two pieces. You should have created your database UNICODE encoded or in the encoding that understands the characters that you need (e.g. LATIN1).

6: Re: New Feature: Formbuilder maxlength (response to 1)

Posted by Lars Pind on 03/21/03 10:20 AM

Yes, I had the problem that the Danish letters æ, ø, and å, took up two bytes each in the DB row, and this fixed it.

Checking psql -l, indeed all my databases are in SQL_ASCII. How do I fix that now?

Switching from bytelength to length is trivial, thankfully.

/Lars

7: Re: New Feature: Formbuilder maxlength (response to 1)

Posted by Lars Pind on 03/21/03 10:29 AM

Looks to me like our documentation is wrong.

https://openacs.org/doc/openacs-4/openacs.html

It doesn't say anything about setting UNICODE encoding, AFAICT.

I don't even see anything in Joel's new doc.

http://aufrecht.org/doc/unix-install.html

/Lars

8: Re: New Feature: Formbuilder maxlength (response to 7)

Posted by Tilmann Singer on 03/24/03 08:48 AM

As far as I know there is no way to change the encoding of an existing database in postgresql apart from pg_dump'ing the contents, recreating the database in the desired new encoding and importing the data. I've never done that myself so I don't know wether it's necessary to specify encodings for pg_dump or psql when importing. You propably need to find a trick to tell pg to export the chars that were wrongly saved as 2 characters as one, or run a regexp over the dump file.

Regarding the missing documentation, I added a comment to the installation page.