Forum OpenACS Q&A: proc like ns_getcsv for char delimited files?

I'm going to be integrating the ecommerce package with a bunch of tab delimited files (generated from tcl code).

Is there a proc available to replace ns_getcsv (ie. that behaves similarly), except works for single character delimiters such as tab or |?

If not, please let me know if ns_getcsv[1] has any undocumented behavior I should know about.

1. http://www.aolserver.com/docs/devel/tcl/api/file.html#ns_getcsv

Collapse
Posted by Tom Jackson on

What would be the reason for not using a standard csv format? One problem is generating the file. Just follow the rules specified at the link you provided. One nice feature of ns_getcsv is that it returns the number of fields found. If you want to enforce this for every row you can use a while:


while {[set fields [ns_getcsv $fileid csv_list]] == $len} ...

Collapse
Posted by Torben Brosten on
Thanks, Tom.

"What would be the reason for not using a standard csv format?"

1. A bunch of existing tcl programs that use the same existing info (tab delimited files).

2. exporting to csv from a client's proprietary program results in noncompliant csv file formats.

Given the aolserver documentation, I'm thinking split, llength and eof should about do it.

The proc will be added to tcl/ecommerce-utilities-procs.tcl (unless someone suggests differently). I'll post it once I've tested it... ec_getsdelim

Collapse
Posted by Tom Jackson on

So I guess the answer is: no there isn't another proc like that, mostly because ns_getcsv needs to read char by char to understand the context of things like double quote, comma and new lines.

Collapse
Posted by Torben Brosten on
Right. For the record, I wrote a slow, clumsy tcl version of ns_getcsv that converts csv files to other delimited forms as a part of learning tcl. It's rough, but has come in handy as it can also process specific fields (with custom proc) and insert results as new fields in the converted file.
Collapse
Posted by Torben Brosten on

This works in the tcl interpreter environment:

proc ec_gets_tab_delimited_line {fileId varName} {
    set delimiter "\t"
    if {[eof $fileId]} {
        set return_val -1
    } else {
        gets $fileId line
        set varName [split $line $delimiter]
        set return_val [llength $varName]
    }
    return $return_val

Now to figure out how to format it for ad_proc so that the delimiter defaults to "\t" but can be changed.

ad_proc ec_gets_char_delimited_line {
    fileId
    varName
    {delimiter "\t"}
} {
    Reads and parses a line of data from a character delimited file 
    similar to ns_getscsv. Defaults to delimit tabs
} {
    if {[eof $fileId]} {
        set return_val -1
    } else {
        gets $fileId line
        set varName [split $line $delimiter]
        set return_val [llength $varName]
    }
    return $return_val
}

Does this appear in "good" form?

Collapse
Posted by Torben Brosten on
oops. It doesn't work in the tcl environ after all. The list doesn't appear to be passing out of the proc. Looking through Tcl for Web Nerds, Practical Programming in Tcl/Tk, 4th ed., and google, I don't seem to find an exact example of a list being returned as a parameter (as I understand ns_getscsv returns the list). Maybe it will have to be passed with global? I'm going to have to sleep on it...
Collapse
Posted by Alfred Werner on
tcllib has a CSV package - you can specify sepChar ... http://tcllib.sourceforge.net/
Collapse
Posted by Tom Jackson on

You can change your proc to upvar the passed in variable. The 'varName' is the name of the var you want to have the proc set. Here is the body of the proc:

    upvar $varName SplitLine

    if {[eof $fileId]} {
        set return_val -1
        set SplitLine [list]
    } else {
        gets $fileId line
        set SplitLine [split $line $delimiter]
        set return_val [llength $varName]
    }
    return $return_val

Collapse
Posted by Torben Brosten on

Thanks Alfred! I should check out the tcllib.

Thanks, Tom! Upvar was one of those strange reserved words, now is a friend =)

proc ec_gets_char_delimited_line {fileId varName {delimiter "\t"} } {
    upvar $varName split_line
    if {[eof $fileId]} {
        set return_val -1
        set split_line [list]
    } else {
        gets $fileId line
        set split_line [split $line $delimiter]
        set return_val [llength $split_line]
    }
    return $return_val
}

translating to ad_proc..

ad_proc ec_gets_char_delimited_line {
    fileId
    varName
    {delimiter "\t"}
} {
    Reads and parses a line of data from a character delimited file 
    similar to ns_getscsv. Defaults to delimit tabs
} {
    upvar $varName split_line
    if {[eof $fileId]} {
        set return_val -1
        set split_line [list]
    } else {
        gets $fileId line
        set split_line [split $line $delimiter]
        set return_val [llength $split_line]
    }
    return $return_val
}
Collapse
Posted by Andrew Piskorski on
Torben, shouldn't parsing more or less any flavor of CSV-like text format be awfully easy in Tcl? Why would you want to use ns_getcsv at all, is it faster and you're worried about performance problems? Or you just thought maybe you could avoid having to write the Tcl code, which you then went ahead and wrote above anyway?
Collapse
Posted by Tom Jackson on

Andrew it appears he has a much simpler format than even csv. ns_getcsv more than likely has to parse char by char to maintain state as it stumbles across the special characters that are actually content.

It isn't necessarily difficult, but I doubt it would be faster than ns_getcsv.

Collapse
Posted by Torben Brosten on
Hi Andrew,

"shouldn't parsing more or less any flavor of CSV-like text format be awfully easy in Tcl?"

sure. Rhis was more an exercise (for me) of matching the behavior of ns_getcsv.

"Why would you want to use ns_getcsv at all, is it faster and you're worried about performance problems?"

ns_getcsv is default hardcoded in the ecommerce package for uploading products, categories etc.

The csv parser I wrote is not directly applicable to uploading. It is more of a data conversion utility (csv-like to single-character delimiter). It parses csv, tracks the pattern (count and type) of field types parsed per line (warns when the field-type record pattern is different, such as a quoted field where there was no quoted field in previous records), and can provides opportunity for inserting custom procs that covert data.

Hope this answers your question,

Torben