Forum OpenACS Q&A: String concatenation w/nulls

Posted by David Walker on
If I understand correctly the oracle concatenation will return '123' when concatenating '123' and NULL while Postgres returns NULL for the same arguments.

I know this causes problems in more than one place. It caused me trouble in /photodb/search.tcl. (I'd get no match simply because one of the fields passed to pseudo_contains was null).

My solution is to add a concatenation operator ( I selected ||| ) that acts like the oracle operator.

--drop function string_concat_allow_null (char,char);
create function string_concat_allow_null (char,char) returns char
as 'select coalesce($1,'''') || coalesce($2,'''') as result'
language 'sql'
with (iscachable);

--drop operator ||| (char,char);

create operator ||| (
 leftarg = char,
 rightarg = char,
 procedure = string_concat_allow_null 

Posted by Don Baccus on
The major problem with this approach is that we're not victim of a PG vs. Oracle problem here, but rather an Oracle vs. SQL92 problem.

So hacking a solution like this into our PG port will simply leave the  problem lingering for those who port to the next SQL92-compliant database.

(user operators and a stored, programmatic language are both non-SQL92 PG extensions).

Much better would be an approach that made the Oracle kludge "special"  rather than the code running in the standards-compliant RDBMS.  In other words, it would be preferable to hack the Oracle queries and/or the surrounding Tcl scripts so the PG code runs with 100% standard SQL92.

Then these same PG queries will work perfectly, without change, in other SQL92-compliant RDBMS engines.

Posted by David Walker on
I'll just mention that I have an issue with SQL92 in this matter. It does not make sense to me concatenating something with nothing should yield nothing.

For any concatenation that I'll be doing I need an operator that treats NULLs as empty strings.
Posted by Don Baccus on
NULL in SQL92 doesn't mean "nothing".  It means "unknown".  Expecting 'string' || NULL to return 'string' implies that you know what the value NULL is, in this case - i.e., the empty string.  But ... you don't, the value is "unknown".

'string' || NULL returns NULL because the "unknown" operand means the result of the concatenation is, itself, "unknown".

This is entirely consistent with how the NULL operand is treated with other operators.  "value operator NULL" always returns "NULL".

Oracle itself claims they may treat the empty string and NULL correctly in a future release, so it's not good to have the Oracle code depend on this, either.  While it is hard to imagine them following through on their threat to adopt the standard behavior, we'd  hate to see our code break if they ever did so ...

The problem here is that Oracle is inserting NULL when you do an INSERT ... VALUES ('') rather than the empty string.  This is dead wrong.  The empty string is not an "unknown" value - the value is perfectly known, i.e. the string with no characters.  As I mentioned above, Oracle recognizes that this treatment is incorrect and may change it in a future release.  The reason they haven't, I'm sure, is because so many customers have so much code dependent on this incorrect behavior that changing is impractical.  At the least they'll need to have a "-empty_string_bug_compat" switch or the like if they ever do switch to the standard semantics.

Regardless of your personal feelings, in this project we want to avoid  Oracle dependencies, in particular a dependency on non-standard semantics that Oracle itself recognizes as being incorrect.

Posted by David Walker on
I agree with you about using nonstandard extensions. I'll try to avoid them in any code I write that may become part of OpenACS.

I think "nothing" vs "unknown" is just semantics, so long as it's understood that NULL != 0 and NULL != ''. While returning NULL if one side of an operator is NULL is logical for most operators, I can certainly see the use of having a string concatenation operator (in addition to, not instead of, the one we have now) that treats NULL as empty string.

That being said, perhaps a better solution would be to change the database schema in some places so that certain fields default to '' instead of NULL or to make sure the '' is there in the INSERT statement.
Posted by Don Baccus on
I think "nothing" vs "unknown" is just semantics
Sure, but the fact that one's not equal to zero is also "just semantics".

This doesn't mean the distinction has no meaning, though ...

The difference between "nothing" and "unknown" is significant and meaningful. I see no reason why the "||" operator should treat the "unknown" case as being the empty string, this would be entirely inconsistent with the treatment of NULL elsewhere.

Now, there are many arguments against the adoption of three-valued logic in the first place, and SQL92 itself isn't entirely consistent in its adoption of three-valued logic (read Date & Darwin's "An Introduction to the SQL Standard" for a full discussion).

But given the fact that SQL92 does have three-valued logic, there's no real reason to complain when they *have* been consistent.