Forum OpenACS Development: Re: Improve user-experience: character limit on form input fields

I guess it depends how the database measures characters. Looks like the browser is counting UTF-8 characters while the database counts a character as one byte.

I suspect an easy solution is to increase the title to varchar(400) but set maxlength to 200.

It does show that automatically handling the maxlength and checking the size is more complex than we thought.

I wonder what [string length] would return on the utf-8 string. Learning this would help us understand if it counts bytes or characters in string length and if checking length in tcl before inserting would be any more effective than maxlength.

Hi guys,

with Oracle, there are a few approaches you can take. One is to change all your VARCHAR2 columns from BYTE to CHAR e.g. declare them as VARCHAR2(200 CHAR) instead of VARCHAR2(200). You would also need to do this with every VARCHAR2 variable in every PL/SQL procedure or function - a lot of work as you can imagine. However, it's worth knowing for a quick fix.

Oracle's recommended approach is to set NLS_LENGTH_SEMANTICS to 'CHAR', which basically tells Oracle to treat the length of every VARCHAR2 (and CHAR) column as being characters and not bytes.

However, the problem with both of these is that you'll still run into the maximum column size 4000 bytes. But now you won't be able to tell when you're going to hit it! I guess it depends on what languages you're using and how many double-byte characters they have. Fortunately for me, English has hardly any, but I did run into this problem once with an Irish-language client. I would say as a rule of thumb, if you think you're going to need more than 2000 characters, then why not use a CLOB?

See here for more detail
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements001.htm#sthref76
and
http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch2charset.htm#i1006683
and a nice introduction
http://www.oracle-base.com/articles/9i/CharacterSemanticsAndGlobalization9i.php
and a good example how-to here:
http://www.oracle.com/technology/oramag/oracle/03-mar/o23sql.html

hope this helps
Brian

Dave,

char in oracle are bytes, so yes, when an utf character uses 2 bytes it uses 2 "char" in oracle.

[string length] will return the number of character in the string according to the encoding and charset used. So even if a character uses 2 or 3 bytes and utf is the charset in use, string length will return 1 for that character. To get the number of bytes, [string bytelength] has to be used. I've used it a lot when working on oracle to check the string before an insert.

I like the idea of increasing the title length to 400 by default.

About setting maxlength to 200, could be done on ad_form?