Forum OpenACS Development: String manipulation in Postgres

Collapse
Posted by Malte Sussdorff on
I was wondering, what is quicker, String Manipulation in Postgres or in TCL, if you need to retrieve a value out of the database:

a) set id [db_string my_id "select ltrim(name,'csr_contact_') as id from cr_items where latest_revision = (select max(latest_revision) from cr_items where content_type = 'contact')"]

vs.

b) set string [db_string my_id_string "select name from cr_items where lastest_revision = (select max(latest_revision) from cr_items where content_type = 'contact')"]
set id [string trimleft $string "csr_contact_"]

Collapse
Posted by Richard Hamilton on
Malte, whilst I cannot answer your question with factual numerical proof, I would have thought that the philosophically better way is to ask the database to return what you want. Perhaps better to keep the application 'in the database' and keep tcl as much as possible as a substrate for display.

Also from a readability point of view the sql is very clear.

Regards
Richard

Collapse
Posted by Don Baccus on
All things being equal (i.e. both being equally efficient in string operations) doing it in the DB will be faster.

Why? Because only the exact string is returned by the query. In the second form, extra characters (those following 'csr_contact_') are returned into the Tcl string. Transferring those bytes has some cost, no matter how small.

So in essence you're guaranteed more byte-shuffling's going on in the second case.

But, I have no way of knowing if the assumption is correct (the "all things being equal" assumption).

In general I do stuff in the DB, and in general when I've seen people do much Tcl manipulation in db_multirow loops they seem to be relatively inefficient.

Collapse
Posted by russ m on
Malte - as an aside, neither PG's ltrim nor TCL's [string trimleft] are meant to be used the way you are here... they remove from the left of one argument the longest substring made up entirely of characters from the other... so ltrim(name,'_acnorst') will have exacly the same output as ltrim(name,'csr_contact_')... as long as whatever comes after 'csr_contact_' doesn't have any of those letters in it it will work as expected, but it may well lead to strange errors if you start using something like csr_contact_supervisor (which would just return 'upervisor')...
Collapse
Posted by Malte Sussdorff on
Russel, good point. I did not think about that as there comes an ID behind the strings, so I would not run into any problems.

But how is a case like that normally handled (take the csr_contacts_supervisor as an example)? Regsub? But how would I make sure that it does not take out "csr_contacts_" everywhere it finds it?

Collapse
Posted by russ m on
in TCL you can anchor the match pattern to the start of the string (also, regsub only operates on the first match unless you pass the -all flag)
regsub {^csr_contact_} $string {} id

in SQL it's not quite so clean... if you know there'll only be one instance of "csr_contact_" in the string you can use

replace(name,'csr_contact_','')
but in the general case it looks like you'd need to do something like
select CASE WHEN name like 'csr_contact_%' THEN substring(name from 13) ELSE name END
ugghhh...