Forum OpenACS Q&A: Running the psql COPY command from db_dml

I am trying to run a PostgreSQL: COPY statement from a db_dml statement.

http://www.postgresql.org/docs/7.4/static/sql-copy.html

But I get the error:

Database operation "dml" failed

ERROR: must be superuser to COPY to or from a file

Is there a way around this?

Thanks
Konstantinos

Collapse
Posted by Malte Sussdorff on
Make sure your user (the one that connects OpenACS with the database) has the right to create databases and create new users. Then it should work
Thanks. I knew what a superuser is. Unfortunatelly I don't the the access to have a user with those rights. This is why I was looking for a workaround 😟
So, any ideas for a workaround?
Collapse
Posted by Mark Aufflick on
The superuser restriction on the server copy command is to prevent remote users from being able to read/write files as the db server user. As an example of why that would be bad, imagine reading /etc/passwd or writing pg_conf

I believe that as a non-super user you can use the copy command with the STDIN or STDOUT parameters, but I don't know how you would write/read the correct file handle from within aolserver

Collapse
Posted by Bruno Mattarollo on

Use the following in your SQL file:

\copy TABLE_NAME from 'FILE' using delimiters (etc etc)
note the \ and the lack of ; at the end of the line.

Hope this helps

Collapse
Posted by Eduardo Pérez on
I think you shouldn't be using COPY at all, because:
- It's not SQL standard.
- Can break if you add or remove columns.

I'd rather use INSERT instead.
Anyway, why do you need to use COPY instead of INSERT?

Collapse
Posted by Dirk Gomez on
Because it is much faster?
Yes, I am using it because it is MUCH much faster to do 20000 records than do an insert one by one.

As for the SQL standart: We are only using postgres, and it works fine here.

Is there a way to use INSERT to insert all the records in one go and not record by record?

Collapse
Posted by Don Baccus on
Check out the ref timezones package for an example such as Bruno gave above. Essentially put the command in a file and run it from psql using "exec" ...

And, yes, it is very much faster. While it is non-standard Oracle does have an equivalent facility and the ref-timezones package makes use of it, too. When we did inserts, even within a transaction, initial install of OpenACS took a long time, 15 minutes or so. Just using COPY for that one set of timezone data cut out 3/4 or more of the install time.

Thank you all for your answers.
Even though the copy works when I run it in psql, when I try to run it using db_dml or db_exec_plsql I get errors:

db_dml:
Database operation "dml" failed
ERROR: syntax error at or near "\" at character 1
SQL: \copy ml_import_records ( record ) from '/tmp/file1sySZZ'

db_exec_plsql:
Database operation "0or1row" failed (exception NSDB, "Query was not a statement returning rows.")
while executing
"ns_db 0or1row $db "select $function_name ()""
invoked from within
"db_exec_plpgsql $db $full_statement_name $sql $statement_name"
invoked from within
"if {[regexp -nocase -- {^\s*select} $test_sql match]} {
ns_log Debug "PLPGSQL: bypassed anon function"
set selection [..."
("uplevel" body line 6)
invoked from within
"uplevel 1 $code_block "
invoked from within
"db_with_handle -dbn $dbn db {
# plsql calls that are simple selects bypass the plpgsql
# mechanism for creating anonymous fun..."
(procedure "db_exec_plsql" line 57)
invoked from within
"db_exec_plsql import_to_db "\\copy ml_import_records ( record ) from '$temp_filename'""

Collapse
Posted by Guan Yang on
Remember that \copy (with a backslash) is a psql command that won't work in db_dml. You should look at the copy statement in SQL.
Collapse
Posted by Nis Jørgensen on
Since the file is probably local to the webserver, and thus not necessarily local to the database server, \copy is probably more apropriate than the sql copy command.

Anyway, it seems to me like this is about importing into a temporary table, and we will need to run inserts from that table afterwards anyway. Of course this might be doable using one statement for all rows, removing the overhead of 20000 inserts+constraint validations - but if these imports are going to end up in the users table (just a guess), we probably will have to handle them separately at some point anyway.

/Nis

Collapse
Posted by Guan Yang on
Nis: That's right, but you can't run psql commands in db_dml. You'd have to write out a temporary SQL script and invoke exec psql.
hmmm... Several things out of this:

a) Since till now in my development environment the database and the webserver are in the same machine, I never met problems with COPY (except the superuser limitation). But, if I continue using it (COPY), do I gather right from your comments that it will not work on my production environment (where db and web are on different machines)?
And this would be solved (as the superuser limitation) with \copy ?

b) Yes, I will handle them seperatelly at some point. But the way I am doing it with this new "import shell", the import itself is one quick step. Then each record I handle, gets "tagged" as "done" or with an error message. When I handle the individual records, I dont' care if the process stops in the middle for any reason (which was happening with the imports till now because of a server down or whatever), because I always know which records have been processed and which not. Also, if for any reason I want to stop an import in the middle, I also can do it easily, and resume etc.

Collapse
Posted by Guan Yang on
COPY could potentially work even if the database and the web server are on different machines, if the input file is accessible to the database server over NFS. Consider a scenario where you place the input files in /some-place on the web server and also mount that directory at /some-place on the database server. Then, if the user running the database (usually postgres) can access /some-place, you should be able to use COPY without problems.
in order to do that I did the following:

Old way:

db_dml import_to_db " copy ml_import_records ( record ) from :temp_filename"

the new way:
exec "$psqlpath/psql -h $host -p $port -U $user $dbname < /tmp/$file_with_sql"

Where "$file_with_sql" contains actually the following

\copy ml_import_records ( record ) from 'temp_file_name'

The First one works fine, but the second doesnt. It seems that no matter what I do, only commands like [db_dml] or [file copy] can "see" the file I am uploading (using ad_form).
Even if I try to do an "exec cp temp_file_name new_file" I get the same error:
"no such file or directory"

For the record.
After some testing, the way that works at the end is:

exec -- ${psqlpath}psql -h $host -p $port -U $user $dbname -c "\\copy ml_import_records ( record ) from $temp_filename"