Forum OpenACS Development: Re: Out of memory in AOLServer

Collapse
Posted by Tom Jackson on
My suggestion would be to figure out how to minimize memory, but there are some issues involved. One is that a CSV file could have new lines (\n), so doing anything based upon the idea that records are all on one line is doomed to fail, and with 400MB of data, it probably will fail.

So what to try? Look at ns_getcsv:

http://rmadilo.com/files/nsapi/ns_getcsv.html

This command can be used once a file is open and set to the correct position. It reads one csv line, which might be more than one Tcl line.

When you open a file in Tcl, it does not read the entire file into memory. You can use seek to move to a point to begin reading. So look at the Tcl commands:

http://junom.com/document/tcl/open.htm
http://junom.com/document/tcl/seek.htm
http://junom.com/document/tcl/tell.htm
http://junom.com/document/tcl/close.htm

Using open and close are obvious. Seek moves to the specified char position to be read. The first move is to seek 0, but once you do an ns_getcsv, you can then do a tell, which should be the next value to give to seek, or you might have to add one or two to the value, only testing will tell. Using these commands you avoid the need to parse char-by-char or know too much about the details of csv. The command ns_getcsv exists because it is a special format which cannot be resolved to a simple loop of open, seek, gets, close, or whatever. Also note that ns_getcsv returns the number of columns found, and you can use the Tcl command [eof] to detect the end of file (like while {![eof $csvfile]}).

Yes you will open and close the file for each record. But you are trading money for time. Money you either don't have or don't want to waste; by processing one record at a time you will probably save programming time because the result will work for every situation. You will lose time to process the data, but you could start the process and take a break. Computers work pretty cheap. I think you will find that the file handling code will execute much faster than whatever database code is done for each record, so there will probably be no downside.