Forum OpenACS Development: Re: complex survey datamodel suggestion/question

Posted by Stan Kaufman on
Malte, here's a reason to put integers into an integer column, and real numbers into a numeric column: all the db's built-in aggregate functions (avg, stddev, sum, variance, etc) and mathematical functions and operators (add, substract, etc etc) won't work. If we do this, all data extraction procs for data in the Assessment system will have to select data, drag it back into tcl, and do the math there. This violates the "push everything back into the database that can be pushed there" principle.

This would be exactly like relying on the tcl to enforce uniqueness or any other kind of constraint. Sure, it could be done, but that's not considered best practice as far as I understand.

What would be the benefit of storing everything (numbers, strings, etc) as a string and then figuring out after you get the data what it is? Surely the amount of disk space saved by not having empty integer and numeric columns is trivial. And we need the other kinds of *_answer columns for holding references to cr_revisions (content_answer) etc etc anyway.

Seems to me that rather than pushing us to conflate all data into a single omnibus string type, Cathy's requirement just means we should add something to handle the exponential data she mentions. And in PG, it looks to me as if the "real" or "double" data type should work fine, in which case we should use it instead of the hackish "exponential type (stored as a varchar)" I suggested.

From :

"Normally, the real type has a range of at least -1E+37 to +1E+37 with a precision of at least 6 decimal digits. The double precision type normally has a range of around -1E+308 to +1E+308 with a precision of at least 15 digits. Values that are too large or too small will cause an error. Rounding may take place if the precision of an input number is too high. Numbers too close to zero that are not representable as distinct from zero will cause an underflow error."

Cathy, am I wrong here? Can't "real" or "double" work in your situation?

Dunno what Oracle-equivalent there is, but surely there is one. (Though since Oracle still doesn't even have a boolean, maybe not.)

Posted by Cathy Sarisky on
Real or double can satisfactorily hold the student's answer, provided we don't care about being able to echo back the student's answer exactly as typed.  If, however, we want to know whether the student typed 6.02x10<sup>23</sup> or 6.020x10<sup>23</sup>, stuffing it into either type of column will result in it being converted to 602000000000000000000000.  And, in the event that we want to do some math on it in tcl despite plans to push everything to the database:

% set big 6.02e23
% set answer 6.01e23
% expr $answer/$big

% set big 6020000000000000000000
% set answer $big
% expr $big/$answer
integer value too large to represent

Posted by Guan Yang on
Use Mpexpr? ;-)
Posted by Cathy Sarisky on
*blush* ok, ignore the tclsh stuff i just posted.  *IF* I write the numbers as non-integers, I can still do math...
Posted by Malte Sussdorff on
Stan, this is a very good reason. I had this argument with some friends and needed canon fooder which you brilliantly gave. I still think that most of the checking should be done by the check_item functions, as this is the only way to give a nice error message back (or am I mistaken and postgres throws up with a reason why it did not like an insert?).