Forum OpenACS Q&A: Running the psql COPY command from db

1: Running the psql COPY command from db_dml

Posted by Konstantinos Kokkorogiannis on 01/07/05 03:52 PM

I am trying to run a PostgreSQL: COPY statement from a db_dml statement.

http://www.postgresql.org/docs/7.4/static/sql-copy.html

But I get the error:

Database operation "dml" failed

ERROR: must be superuser to COPY to or from a file

Is there a way around this?

Thanks
Konstantinos

2: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Malte Sussdorff on 01/07/05 03:59 PM

Make sure your user (the one that connects OpenACS with the database) has the right to create databases and create new users. Then it should work

3: Re: Re: Running the psql COPY command from db_dml (response to 2)

Posted by Konstantinos Kokkorogiannis on 01/07/05 09:25 PM

Thanks. I knew what a superuser is. Unfortunatelly I don't the the access to have a user with those rights. This is why I was looking for a workaround 😟
So, any ideas for a workaround?

4: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Mark Aufflick on 01/08/05 03:46 AM

The superuser restriction on the server copy command is to prevent remote users from being able to read/write files as the db server user. As an example of why that would be bad, imagine reading /etc/passwd or writing pg_conf

I believe that as a non-super user you can use the copy command with the STDIN or STDOUT parameters, but I don't know how you would write/read the correct file handle from within aolserver

5: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Bruno Mattarollo on 01/10/05 12:31 PM

Use the following in your SQL file:

\copy TABLE_NAME from 'FILE' using delimiters (etc etc)

note the \ and the lack of ; at the end of the line. Hope this helps

6: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Eduardo Pérez on 01/11/05 04:08 PM

I think you shouldn't be using COPY at all, because:
- It's not SQL standard.
- Can break if you add or remove columns.

I'd rather use INSERT instead.
Anyway, why do you need to use COPY instead of INSERT?

7: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Dirk Gomez on 01/11/05 04:23 PM

Because it is much faster?

8: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Konstantinos Kokkorogiannis on 01/12/05 11:32 AM

Yes, I am using it because it is MUCH much faster to do 20000 records than do an insert one by one.

As for the SQL standart: We are only using postgres, and it works fine here.

Is there a way to use INSERT to insert all the records in one go and not record by record?

9: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Don Baccus on 01/12/05 06:09 PM

Check out the ref timezones package for an example such as Bruno gave above. Essentially put the command in a file and run it from psql using "exec" ...

And, yes, it is very much faster. While it is non-standard Oracle does have an equivalent facility and the ref-timezones package makes use of it, too. When we did inserts, even within a transaction, initial install of OpenACS took a long time, 15 minutes or so. Just using COPY for that one set of timezone data cut out 3/4 or more of the install time.

10: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Konstantinos Kokkorogiannis on 04/22/05 12:02 PM

Thank you all for your answers.
Even though the copy works when I run it in psql, when I try to run it using db_dml or db_exec_plsql I get errors:

db_dml:
Database operation "dml" failed
ERROR: syntax error at or near "\" at character 1
SQL: \copy ml_import_records ( record ) from '/tmp/file1sySZZ'

db_exec_plsql:
Database operation "0or1row" failed (exception NSDB, "Query was not a statement returning rows.")
while executing
"ns_db 0or1row $db "select $function_name ()""
invoked from within
"db_exec_plpgsql $db $full_statement_name $sql $statement_name"
invoked from within
"if {[regexp -nocase -- {^\s*select} $test_sql match]} {
ns_log Debug "PLPGSQL: bypassed anon function"
set selection [..."
("uplevel" body line 6)
invoked from within
"uplevel 1 $code_block "
invoked from within
"db_with_handle -dbn $dbn db {
# plsql calls that are simple selects bypass the plpgsql
# mechanism for creating anonymous fun..."
(procedure "db_exec_plsql" line 57)
invoked from within
"db_exec_plsql import_to_db "\\copy ml_import_records ( record ) from '$temp_filename'""

11: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Guan Yang on 04/22/05 12:36 PM

Remember that \copy (with a backslash) is a psql command that won't work in db_dml. You should look at the copy statement in SQL.

12: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Nis Jørgensen on 04/22/05 01:00 PM

Since the file is probably local to the webserver, and thus not necessarily local to the database server, \copy is probably more apropriate than the sql copy command.

Anyway, it seems to me like this is about importing into a temporary table, and we will need to run inserts from that table afterwards anyway. Of course this might be doable using one statement for all rows, removing the overhead of 20000 inserts+constraint validations - but if these imports are going to end up in the users table (just a guess), we probably will have to handle them separately at some point anyway.

/Nis

13: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Guan Yang on 04/22/05 01:27 PM

Nis: That's right, but you can't run psql commands in db_dml. You'd have to write out a temporary SQL script and invoke exec psql.

14: Re: Re: Running the psql COPY command from db_dml (response to 12)

Posted by Konstantinos Kokkorogiannis on 04/22/05 01:41 PM

hmmm... Several things out of this:

a) Since till now in my development environment the database and the webserver are in the same machine, I never met problems with COPY (except the superuser limitation). But, if I continue using it (COPY), do I gather right from your comments that it will not work on my production environment (where db and web are on different machines)?
And this would be solved (as the superuser limitation) with \copy ?

b) Yes, I will handle them seperatelly at some point. But the way I am doing it with this new "import shell", the import itself is one quick step. Then each record I handle, gets "tagged" as "done" or with an error message. When I handle the individual records, I dont' care if the process stops in the middle for any reason (which was happening with the imports till now because of a server down or whatever), because I always know which records have been processed and which not. Also, if for any reason I want to stop an import in the middle, I also can do it easily, and resume etc.

15: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Guan Yang on 04/22/05 02:26 PM

COPY could potentially work even if the database and the web server are on different machines, if the input file is accessible to the database server over NFS. Consider a scenario where you place the input files in /some-place on the web server and also mount that directory at /some-place on the database server. Then, if the user running the database (usually postgres) can access /some-place, you should be able to use COPY without problems.

16: Re: Re: Running the psql COPY command from db_dml (response to 13)

Posted by Konstantinos Kokkorogiannis on 04/25/05 05:31 PM

in order to do that I did the following:

Old way:

db_dml import_to_db " copy ml_import_records ( record ) from :temp_filename"

the new way:
exec "$psqlpath/psql -h $host -p $port -U $user $dbname < /tmp/$file_with_sql"

Where "$file_with_sql" contains actually the following

\copy ml_import_records ( record ) from 'temp_file_name'

The First one works fine, but the second doesnt. It seems that no matter what I do, only commands like [db_dml] or [file copy] can "see" the file I am uploading (using ad_form).
Even if I try to do an "exec cp temp_file_name new_file" I get the same error:
"no such file or directory"

17: Re: Running the psql COPY command from db_dml (response to 1)

Posted by Konstantinos Kokkorogiannis on 07/04/05 11:48 AM

For the record.
After some testing, the way that works at the end is:

exec -- ${psqlpath}psql -h $host -p $port -U $user $dbname -c "\\copy ml_import_records ( record ) from $temp_filename"

Forum OpenACS Q&A: Running the psql COPY command from db_dml