Forum OpenACS Development: util_httppost

Request notifications

Collapse
Posted by Antonio Pisano on
Hello everyone,

I found mysefl several times in the need to call webservices on the internet by POST requests.

util_httppost was useful sometimes, but unfortunately nowadays webservers make use a lot of SSL, compression and such, which make useless this simple proc.

Under unix we have curl, which is a very neat tool for this kind of task and supports all the advanced features one should need when POSTing to a server.

Inspired by acs/tcl/proxy-procs.tcl, I have created a conditional wrapper to the proc: if curl is present on the system, utill_httppost will use curl, otherwise the regular proc will be used.

Using curl options, I kept the very same behavior of the normal proc, so the change should be transparent for existing code.

Is it ok if I commit the change to the CVS?

Collapse
2: Re: util_httppost (response to 1)
Posted by Gustaf Neumann on
In the naviserver world, ns_http [1] and ns_ssl [2] are the preferred interfaces, since they use directly the existing C-level socket interface (ns_ssl requires the nsssl module to be loaded).

aolserver and navisever have as well ns_httppost.

And then there are already places in OpenACS that use wget. See e.g. apm_transfer_file in acs-tcl/tcl/apm-file-procs.tcl
I'am not aware of any place that is using curl so far. Is there a good reason using curl instead of wget?

... and then there is as well ::xo::HttpRequest, which supports ssl + POST.

It would be great to cleanup here and collect the http(s)-client requests and to provide a common interface for this, checking what's available and "do the right thing". Could you help here?

Actually, util_httppost is not the best name at all. The file acs-tcl/tcl/utilities-procs.tcl defines a namespace "::util", most functions in this file use the prefix "util_", some functions are in the namespace. seems that someone left here something unfinished back. IMHO, all util_* functions should be deprecated, and be moved into ::util:: or some better namespace.

-gn

[1] https://bitbucket.org/naviserver/naviserver/src/459dfee88af56bdeccbee118013edf2be57bb402/doc/src/naviserver/ns_http.man?at=default
[2] https://bitbucket.org/naviserver/nsssl/src

Collapse
3: Re: util_httppost (response to 2)
Posted by Antonio Pisano on
Sometimes capabilities of wget and curl intersect, but wget is mostly oriented to the download of files, while curl to the upload.

First, curl allows the uploading of multiple files to web forms, and second, it supports transparently authentication, ssl and compression of the stream. This are the most important features wget lacks by my knowledge.

On the other hand, curl cannot download recursively.

My toughts on the use of curl come from this doubt: is there now an interface allowing to upload multiple files and fields, by POST or GET, to a webserver requiring authentication and using ssl and/or compression?

There is some confusion indeed about http client api, as we have the ns_*, ad_*, util_* and ::xo::* alternatives around, but (correct me if I am wrong), at the moment none of those offers all the features I mentioned.

I can of course help in revising the situation, but what would be the route you'd suggest?

Collapse
4: Re: util_httppost (response to 1)
Posted by Steffen Tiedemann Christensen on
For the sake of completeness: There's also the built-in http package in Tcl:
http://www.tcl.tk/man/tcl8.4/TclCmd/http.htm

This does a nice job of handling SSL request (using the Tcl tls package) and modelling different kinds of requests. It also plays nicely with Tcl threads and async queues (just like ns_http seems to).

Even with an exec proxy in place, there are significant downsides to using curl instead of something more native to the environment. Even if you assume that the external process doesn't leak, you'll find that curl has slightly different quoting notations and that some charsets require explicit handling and [encoding convertto].

Collapse
5: Re: util_httppost (response to 4)
Posted by Antonio Pisano on
Thanks for the pointer Steffen. You're right about the use of external commands, but I couldn't think of a better tool around... It was also very inviting, because it would have been a lot of stuff less to care about.

The tcl native api seems suitable and looks like the best choice to me. Does aolserver/naviserver api have some significant benefit over tcl's?

We could just deprecate old procedural api and write a well-named one using tcl http. Reference implementation could be ::xo::HttpRequest, which could then be wrapped into a procedural set of commands (or the countrary).

What do you think about it?

Collapse
6: Re: util_httppost (response to 5)
Posted by Steffen Tiedemann Christensen on
Probably Gustaf is better suited to answer your question about ns_http -- since I haven't had a chance to use it (much) myself. His mentioning of direct usage of C-level API and better integration into AOLserver threading are probably compelling reasons to investigate.

We use a combination of Tcl http and proxies curls ourselves, although the latter is reserved for very specific tasks such as file uploads. (In fact, probably these task would be better managed through a queueing system that would run the uploads entirely outside of AOLserver.

Collapse
7: Re: util_httppost (response to 5)
Posted by Brian Fenton on
I asked a related question a few years ago on the AOLserver mailing list, and received a plethora of responses! http://comments.gmane.org/gmane.comp.web.aolserver/16000

As well as ns_http queue, you have ns_httpspost and friends in the ns_openssl Tcl API (if you want to get really deep into it!). The one thing missing was a ns_https equivalent for the core's ns_http command.

cheers!
Brian

Collapse
8: Re: util_httppost (response to 5)
Posted by Gustaf Neumann on
The solutions for http client functions might be divided into the following groups:
  • tcl-socket-based
  • tcl-implemented ns_socket-based
  • external
  • built-in
The tcl-socket based solutions (e.g. from tcllib) face the problem that these are select() based, and that tcl uses it's own notification management, which is not integrated with AOLserver/NaviServer. If one uses e.g. Tcl-threads, then notification management works well there, but that's sometime more work. One major limitation with select() is that it works only with up to 1024 file descriptors (on Linux; on other systems, the limit is sometimes lower). If one issues a select() with a fd above this limit one will experience a crash. Raising the limit is for most applications out of scope (requires a own C-library and kernel). Unfortunately, the ::xo::http has the same problem; one can say "who cares, we have just a few handles, this is not a real limit", but for scalability, it is. Some sites use e.g. NaviServer with web sockets in the large, there select() is not an option.

The tcl-implemented ns_socket-based solutions in AOLserver/NaviServer use the ns_* interface, these are limited to plain socket communications. There is ns_openssl_sockopen, but i can't comment about the state of that, i've never tried to use it. AOLserver uses on some places select() so the benefit over Tcl sockets is somewhat limited.

The external scripts/programs (such as wget/curl/...) have the most features, but these are not well integrated (error handling, encodings, ...), require exec (or better nsproxy), typically file-io, ...

The native commands ns_http (AOLserver and NaviServer) and ns_ssl (NaviServer module nsssl) are C-implemented and well integrated with the server. I can't say to much about AOLserver (except that it uses still on several places select() and that it is on a more or less frozen development state) since we switched all our installation to NaviServer several years ago. NaviServer is select()-free, and sees an active development. Also ns_http on NaviServer has more features than the AOLserver variant. on NaviServer one can e.g. specify to spool http client requests above a certain size to a file such it can be used for large files. With that functionality, i've implemented some time ago a NaviServer based reverse proxy, converting https to http, etc. I wouldn't like to write a reverse proxy based on wget/curl.

Concerning your questions: "is there now an interface allowing to upload multiple files and fields, by POST or GET, to a web server requiring authentication and using ssl and/or compression":

  • multiple files/fields: ns_http/ns_ssl work on the protocol layer, not the content layer. One can pass data to POST request, but it has to be encoded properly first.
  • If one passes the credentials in the GET/POST/... request, it handles the authentication. i'am not sure how much dialog one wants to have in a connection thread with other servers, since this has potential for a DOS attack.
  • ssl: ns_http handles http, ns_ssl handles https. ns_ssl is part of the nsssl module.
  • compression: a server just sends content compressed, if so required in the GET/POST request. If the result is compressed the received content is compressed. For many applications this is the right thing (e.g. for the mentioned reverse proxy, which passed the content though). It is probably useful to add a flag to ns_http/ns_ssl to converted content automatically, maybe i could look into this the next days.
A common wrapper for ns_http+ns_ssl looks to me like a good idea.

My recommendation for NaviServer based installations is clear. For AOLserver, i would say that every solution with little effort is fine, like e.g. switching in the wrapper to xo::HttpRequest or to external programs.

these are just my 5cents.
all the best
-gustaf neumann

Collapse
9: Re: util_httppost (response to 8)
Posted by Gustaf Neumann on
ps: concerning gzip: strictly speaking, adding code to NaviServer to support gunzipping is not needed. Tcl 8.6 supports it gzi/gunzip natively, there are several add-ons libraries for earlier Tcl's around. However, it is better to combine gzip with the streaming facilities of NaviServer, such that the content can be incrementally guzipped, similar to what happens when sending content compressed. So adding support sounds useful for symmetry and convenience reasons, and might be useful for other commands as well.
Collapse
10: Re: util_httppost (response to 9)
Posted by Antonio Pisano on
This is a stub of an http GET proc based on the conclusions of this discussion.

It switches to ssl, when available on naviserver, based on the url issued, or on a flag.

If response appears to be gzipped, returns the gunzipped output.

All this procs would go into util::http:: namespace. http:: namespace is not good because conflicts with tcl's.

For Aolserver, I plan to use a conditional definition of the proc, which will use curl.

Here it is, if the direction is right I will go on with the rest, then I'll collect all http procedural api into one file and deprecate the old one.

namespace eval util {}
namespace eval util::http {}

ad_proc util::http::get {
    -url 
    {-headers ""} 
    {-timeout 30}
    {-depth 0}
    {-force_ssl f}
} {
    Issue an http GET request to url.
Switches to SSL whenever encounters an 'https' url.
If force_ssl is set to true, ssl will be used also for 'http://' urls
Returns the data in array get form with array elements page, status, and modified. } { set this_proc [info level 0] if {![regexp "(https|http)://*" $url]} { return -code error "${this_proc}: Invalid url: $url" } set max_depth 10 if {[incr depth] > $max_depth} { return -code error "${this_proc}: Recursive redirection: $url" } # Check wether we will use ssl or not if {$force_ssl || [regexp "https://*" $url]} { if {[info commands ns_ssl] eq ""} { return -code error "${this_proc}: SSL not enabled: $url" } set http_api "ns_ssl" } else { set http_api "ns_http" } set cmd {$http_api queue -method GET -timeout $timeout} # empty header would throw an error if {$headers ne ""} { lappend cmd -headers $headers } # Queue call to the url and wait for response set resp_headers [ns_set create resp_headers] $http_api wait -result page -status status -headers $resp_headers [eval "$cmd $url"] # Get values from response headers, then remove them set content_encoding [ns_set iget $resp_headers content-encoding] set location [ns_set iget $resp_headers location] set last_modified [ns_set iget $resp_headers last-modified] ns_set free $resp_headers # Redirection... if {$status == 302 || $status == 301} { if {$location ne ""} { return [${this_proc} -url $location -force_ssl $force_ssl -headers $headers -timeout $timeout -depth $depth] } else { return -code error "${this_proc}: Redirection without location: $url" } # Page not modified since date specified... } elseif {$status == 304} { set page "" } # If output is gzipped, try decompression... if {$content_encoding eq "gzip"} { # ...first using naviserver API... if {[info commands ns_zlib] ne ""} { set page [ns_zlib uncompress $page] # ...then tcl's (from 8.6) } elseif {[info commands zlib] ne ""} { set page [zlib decompress $page] } } return [list \ page $page \ status $status \ modified $last_modified] }

Collapse
11: Re: util_httppost (response to 10)
Posted by Gustaf Neumann on
Yep, the direction looks right to me!

Why is the force_ssl here? using "http" vs. "https" should be enough. i would see no big benefit on using e.g. "-url http://foo -force_ssl t" over "-url https://foo".

For large requests, allowing to spool to a file make sense (otherwise 2GB would be the upper limit, since evey Tcl variable can be at most 2 GB large). Additionally, this allows then still a small memory footprint.

Collapse
12: Re: util_httppost (response to 11)
Posted by Antonio Pisano on
I put force_ssl option because I clearly remembered about webservices having http url and requiring SSL. Strange enough...

I enhanced procs with file spooling, but on my installation it didn't work out... does it require some configuration to be enabled? I left the feature disabled by a single commentable line of code.

I have added an util::http::post proc to handle POSTing of form vars and/or files. Many parts of the old util_http_file_upload from Michael Cleverly came out very useful and I could conserve them in the new one. Some time ago I had already enhanced that very proc for my former company, so it could send more than one file, even for single form file fields allowing multiple values.

This is the new tcl file for http client functionalities. I leave it here for revision and approval.

ad_library {

    Procs for http client comunication

    @author Antonio Pisano
    @creation-date 2014-02-13
}

namespace eval util {}
namespace eval util::http {}

ad_proc util::http::get {
    -url 
    {-headers ""} 
    {-timeout 30}
    {-depth 0}
    -force_ssl:boolean
    {-spool_file ""}
} {
    Issue an http GET request to url.
Switches to SSL whenever encounters an 'https' url.
If force_ssl is set to true, ssl will be used also for 'http://' urls
Returns the data in array get form with array elements page, status, and modified. } { set this_proc [info level 0] if {![regexp "(https|http)://*" $url]} { return -code error "${this_proc}: Invalid url: $url" } set max_depth 10 if {[incr depth] > $max_depth} { return -code error "${this_proc}: Recursive redirection: $url" } # Check wether we will use ssl or not if {$force_ssl_p || [regexp "https://*" $url]} { if {[info commands ns_ssl] eq ""} { return -code error "${this_proc}: SSL not enabled: $url" } set http_api "ns_ssl" } else { set http_api "ns_http" } # Spooling to files is disabled for now set spool_file "" set queue_cmd {$http_api queue -timeout $timeout -method GET} # empty header would throw an error if {$headers ne ""} { append queue_cmd " -headers $headers" } if {$spool_file ne ""} { append cmd " -spoolsize 0 -file $spool_file" set page "${this_proc}: response spooled to '$spool_file'" } set queue [eval "$queue_cmd $url"] # Queue call to the url and wait for response set resp_headers [ns_set create resp_headers] set wait_cmd {$http_api wait -status status -headers $resp_headers} if {$spool_file eq ""} { append wait_cmd " -result page" } eval "$wait_cmd $queue" # Get values from response headers, then remove them set content_encoding [ns_set iget $resp_headers content-encoding] set location [ns_set iget $resp_headers location] set last_modified [ns_set iget $resp_headers last-modified] ns_set free $resp_headers # Redirection... if {$status == 302 || $status == 301} { if {$location ne ""} { return [${this_proc} -url $location -force_ssl_p $force_ssl_p -headers $headers -timeout $timeout -depth $depth -spool_file $spool_file] } else { return -code error "${this_proc}: Redirection without location: $url" } # Page not modified since date specified... } elseif {$status == 304} { set page "" } # If output is gzipped, try decompression... if {$content_encoding eq "gzip"} { # ...first using naviserver API... if {[info commands ns_zlib] ne ""} { set page [ns_zlib uncompress $page] # ...then tcl's (from 8.6) } elseif {[info commands zlib] ne ""} { set page [zlib decompress $page] } } return [list \ page $page \ status $status \ modified $last_modified] } ad_proc util::http::post { {-files ""} {-datas ""} -base64:boolean {-filenames {}} {-names {}} {-mime_types {}} {-mode formvars} {-headers ""} -url {-formvars {}} {-timeout 30} {-depth 0} -force_ssl:boolean {-spool_file ""} } { Implement client-side HTTP POST with file uploads. When files are specified for upload, form will be a multipart/form-data, otherwise it will be sent as application/x-www-form-urlencoded. Setting headers for 'multipart/form-data' allow to force the kind of form that will be sent.

The switches -files {/path/to/file /path/to/second-file ... } and -datas {$raw_data_1 $raw_data_2 ...} are mutually exclusive. You can specify one or the other, but not both. NOTE: it is perfectly valid to not specify either, in which case no file is uploaded, but form variables are encoded using multipart/form-data instead of the usual encoding (as noted aboved).

If you specify either -files or -datas you must supply a value for -names, which is the list of names of the respective <INPUT TYPE="file" NAME="..."> form tag.

Specify the -base64 switch if the file (or data) needs to be base-64 encoded. Not all servers seem to be able to handle this. (For example, http://mol-stage.usps.com/mml.adp, which expects to receive an XML file doesn't seem to grok any kind of Content-Transfer-Encoding.)

If you specify -files then -filenames is optional (it can be infered from the name of the file). However, if you specify -datas then it is mandatory.

If -mime_types is not specified then ns_guesstype is used to try and find a mime type based on the filename. If ns_guesstype returns */* the generic value of application/octet-stream will be used.

Any form variables may be specified in one of four formats:

  • array (list of key value pairs like what [array get] returns)
  • formvars (list of url encoded formvars, i.e. foo=bar&x=1)
  • ns_set (an ns_set containing key/value pairs)
  • vars (a list of tcl vars to grab from the calling enviroment)

-headers specifies an ns_set of extra headers to send to the server when doing the POST.

-timeout and -depth, are optional. When POSTing, we are not following redirects, but depth is passed to util::http::get when a redirect happens } { set this_proc [info level 0] if {![regexp "(https|http)://*" $url]} { return -code error "${this_proc}: Invalid url: $url" } set max_depth 10 if {[incr depth] > $max_depth} { return -code error "${this_proc}: Recursive redirection: $url" } # Check wether we will use ssl or not if {$force_ssl_p || [regexp "https://*" $url]} { if {[info commands ns_ssl] eq ""} { return -code error "${this_proc}: SSL not enabled: $url" } set http_api "ns_ssl" } else { set http_api "ns_http" } # sanity checks on switches given if {[lsearch -exact {formvars array ns_set vars} $mode] == -1} { return -code error "${this_proc}: Invalid mode \"$mode\"; should be one of: formvars, array, ns_set, vars" } set variables [list] switch -- $mode { array { set variables $formvars } formvars { foreach formvar [split $formvars &] { set formvar [split $formvar =] set key [lindex $formvar 0] set val [join [lrange $formvar 1 end] =] lappend variables $key $val } } ns_set { for {set i 0} {$i < [ns_set size $formvars]} {incr i} { set key [ns_set key $formvars $i] set val [ns_set value $formvars $i] lappend variables $key $val } } vars { foreach key $formvars { upvar 1 $key val lappend variables $key $val } } } if {$headers eq ""} { set headers [ns_set create headers] } set req_content_type [ns_set iget $headers "Content-type"] set multipart_p [regexp "multipart/form-data" $req_content_type] # We have files to be uploaded, this will be a 'multipart/form-data' request if {$multipart_p || ($datas ne [list] && $files ne [list])} { if {$files ne "" && $datas ne ""} { return -code error "${this_proc}: -files and -datas are mutually exclusive; can't use both" } if {$files ne ""} { foreach file $files filename $filenames mime_type $mime_types { if {![file exists $file]} { return -code error "${this_proc}: Error reading file: $file not found" } if {![file readable $file]} { return -code error "${this_proc}: Error reading file: $file permission denied" } set fp [open $file] fconfigure $fp -translation binary lappend datas [read $fp] close $fp if {$filename eq ""} { lappend filenames [file tail $file] } if {$mime_type eq ""} { lappend mime_types [ns_guesstype $file] } } } set boundary [ns_sha1 [list [clock clicks -milliseconds] [clock seconds]]] ns_set put $headers "Content-type" "multipart/form-data; boundary=$boundary" set payload {} if {$datas ne ""} { if {[llength $datas] != [llength $names]} { return -code error "${this_proc}: Cannot upload file without specifing form variable -name" } if {[llength $datas] != [llength $filenames]} { return -code error "${this_proc}: Cannot upload file without specifing -filename" } foreach data $datas filename $filenames name $names mime_type $mime_types { if {$mime_type eq ""} { set mime_type [ns_guesstype $filename] if {[string equal $mime_type */*] || $mime_type eq ""} { set mime_type application/octet-stream } } if {$base64_p} { set data [base64::encode base64] set transfer_encoding base64 } else { set transfer_encoding binary } append payload --$boundary \ \r\n \ "Content-Disposition: form-data; " \ "name=\"$name\"; filename=\"$filename\"" \ \r\n \ "Content-Type: $mime_type" \ \r\n \ "Content-transfer-encoding: $transfer_encoding" \ \r\n \ \r\n \ $data \ \r\n } } foreach {key val} $variables { append payload --$boundary \ \r\n \ "Content-Disposition: form-data; name=\"$key\"" \ \r\n \ \r\n \ $val \ \r\n } append payload --$boundary-- \r\n # No files to upload, this will be an 'application/x-www-form-urlencoded' request } else { ns_set put $headers "Content-type" "application/x-www-form-urlencoded" set exp_vars [list] foreach {key val} $variables { lappend exp_vars [list $key $val] } set payload [export_vars $exp_vars] } # Spooling to files is disabled for now set spool_file "" set queue_cmd {$http_api queue -timeout $timeout -method POST -body $payload -headers $headers} if {$spool_file ne ""} { append queue_cmd " -spoolsize 0 -file $spool_file" set page "${this_proc}: response spooled to '$spool_file'" } set queue [eval "$queue_cmd $url"] set resp_headers [ns_set create resp_headers] set wait_cmd {$http_api wait -status status -headers $resp_headers} if {$spool_file eq ""} { append wait_cmd " -result page" } # Queue call to the url and wait for response eval "$wait_cmd $queue" # Get values from response headers, then remove them set content_encoding [ns_set iget $resp_headers content-encoding] set location [ns_set iget $resp_headers location] set last_modified [ns_set iget $resp_headers last-modified] ns_set free $resp_headers # Redirection for a POST request is normal, just follow with GET if {$status == 302 || $status == 301} { if {$location ne ""} { return [util::http::get -url $location -force_ssl_p $force_ssl_p -headers $headers -timeout $timeout -depth $depth -spool_file $spool_file] } else { return "" } # Page not modified since date specified... } elseif {$status == 304} { set page "" } # If output is gzipped, try decompression... if {$content_encoding eq "gzip"} { # ...first using naviserver API... if {[info commands ns_zlib] ne ""} { set page [ns_zlib uncompress $page] # ...then tcl's (from 8.6) } elseif {[info commands zlib] ne ""} { set page [zlib decompress $page] } } return [list \ page $page \ status $status \ modified $last_modified] }

Collapse
13: Re: util_httppost (response to 12)
Posted by Gustaf Neumann on
It is developing nicely. A few comments:
- the spoolsize options were added in aug last year, after the release of NaviServer 4.99.5; you can test spooling with the "tip" version of NaviServer from bitbucket, but one should wait for general use until 4.99.6 is released.
- there is already some redundancy between util::http::get and util::http::post. It would be better to implement a "util::http::request -method GET|POST|..." that does the heavy lifting, and maybe convenience methods for "get", "post" etc. on top of this when needed.
- one should use the Tcl expand operator rather than "eval".
- the result of the queue_cmd is not a queue, but a handle
- without requesting a gzipped content (via adding Accept-Encoding gzip), the result will never be gzipped.
- Currently, the list of options ot post is very long and not orthogonal. the data of the post request is either attribute/value pairs, or multipart variants "datas" or "files" if i see this correctly. I think, it would be conceptually nicer to have a "-data [util::http::data ... ]" which passes the raw data to the request. In many cases, "-data [form_vars -form ....]" will be sufficient, when the default encoding is set depending on data provided and multipart. Allowing a user to specify a raw data is certainly useful (e.g. for put requests, dav*, etc.)
- i am not sure, that the many ways specifying variables is needed (it should not part of "post" or "request".
- the "ns_zlib uncompress" does not a gunzip, the proper tcl command should be "zlib gunzip"; in case no decompressor is available, an error should be raised.
Collapse
15: Re: util_httppost (response to 13)
Posted by Antonio Pisano on
I've sent you the new version by mail because I don't want to clobber the whole forum...

Let me know

All the best

Antonio

Collapse
14: Re: util_httppost (response to 8)
Posted by Gustaf Neumann on
"ns_http wait" and "ns_ssl wait" have now a "-decompress" option in the tip versions on bitbucket.

-gn