- Publicity: Public Only All
http-client-procs.tcl
Procs for HTTP client communication
- Location:
- packages/acs-tcl/tcl/http-client-procs.tcl
- Created:
- 2014-02-13
- Author:
- Antonio Pisano
Procedures in this file
- util::get_http_status (public)
- util::http::available (public)
- util::http::basic_auth (public)
- util::http::cookie_auth (public)
- util::http::get (public)
- util::http::post (public)
- util::http::post_payload (public)
- util::http::set_cookies (public)
- util::link_responding_p (public)
Detailed information
util::get_http_status (public)
util::get_http_status [ -url url ] [ -use_get_p use_get_p ] \ [ -timeout timeout ]
- Switches:
- -url (optional)
- -use_get_p (optional, defaults to
"1"
)- -timeout (optional, defaults to
"30"
)- Returns:
- the HTTP status code, e.g., 200 for a normal response or 500 for an error, of a URL. By default this uses the GET method instead of HEAD since not all servers will respond properly to a HEAD request even when the URL is perfectly valid. Note that this means that the server may be sucking down a lot of bits that it doesn't need.
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
util::http::available (public)
util::http::available [ -preference preference ] [ args... ]
Return the preferred HTTP API among those available based on preference and OpenACS installation capabilities.
- Switches:
- -preference (optional, defaults to
"native curl"
)- decides which available implementation prefer in respective order. Choice is between 'native', based on ns_http api, available for NaviServer only and giving the best performances and 'curl', which wraps the command line utility (available on every system with curl installed).
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
util::http::basic_auth (public)
util::http::basic_auth [ -headers headers ] -username username \ -password password
Builds BASIC authentication header for an HTTP request
- Switches:
- -headers (optional)
- ns_set of request headers that will be populated with auth header. If not specified, a new ns_set will be created. Existing header for BASIC authentication will be overwtitten.
- -username (required)
- Username for authentication
- -password (required)
- Password for authentication
- Returns:
- ns_set of headers containing authentication data
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
util::http::cookie_auth (public)
util::http::cookie_auth [ -headers headers ] [ -auth_vars auth_vars ] \ [ -auth_url auth_url ] [ -auth_form auth_form ] \ [ -auth_cookies auth_cookies ] [ -preference preference ]
This proc implements the generic pattern for cookie-based authentication: user logs in a webpage providing username, password and optionally other information in a form, page replies generating one or more authentication cookies by which user will be recognized on subsequent interaction with the server. By this method was possible, for example, to authenticate on a remote OpenACS installation providing 'email' and 'password' as credentials to the /register/ page, and using 'ad_session_id' and 'ad_user_login' as 'auth_cookies'. This proc is a bit hacky and is nowadays not clear if it makes sense anymore... This proc takes care to submit to the login form also every other formfield on the login page. This is important because this (often hidden) formfields can contain tokens necessary for the authentication process.
- Switches:
- -headers (optional)
- ns_set of request headers that will be populated with auth headers. If not specified, a new ns_set will be created. Existing cookies will be overwritten.
- -auth_vars (optional)
- Variables issued to the login page in 'export_vars -url' form.
- -auth_url (optional)
- Login url
- -auth_form (optional)
- Form to put our data into. If not specified, there must be only one form on the login page, otherwise proc will throw an error.
- -auth_cookies (optional)
- Cookies we should look for in the response from the login page to obtain authentication data. If not specified, this will refer to every cookie received into 'set-cookie' response headers.
- -preference (optional, defaults to
"native curl"
)- Returns:
- ns_set of headers containing authentication data
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
util::http::get (public)
util::http::get [ -url url ] [ -headers headers ] [ -timeout timeout ] \ [ -max_depth max_depth ] [ -force_ssl ] [ -gzip_response ] \ [ -spool ] [ -preference preference ]
Issue an HTTP GET request to 'url'.
- Switches:
- -url (optional)
- -headers (optional)
- specifies an ns_set of extra headers to send to the server when doing the request. Some options exist that allow one to avoid the need to specify headers manually, but headers will always take precedence over options.
- -timeout (optional, defaults to
"30"
)- Timeout in seconds. The value can be an integer, a floating point number or an ns_time value.
- -max_depth (optional, defaults to
"10"
)- -force_ssl (optional, boolean)
- specifies whether we want to use SSL despite the url being in http:// form. Default behavior is to use SSL on https:// URLs only.
- -gzip_response (optional, boolean)
- informs the server that we are capable of receiving gzipped responses. If server complies to our indication, the result will be automatically decompressed.
- -spool (optional, boolean)
- enables file spooling of the request on the file specified. It is useful when we expect large responses from the server. The result is spooled to a temporary file, the name is returned in the file component of the result.
- -preference (optional, defaults to
"native curl"
)- decides which available implementation prefer in respective order. Choice is between 'native', based on ns_ api, available for NaviServer only and giving the best performances and 'curl', which wraps the command line utility (available on every system with curl installed).
- Returns:
- the data as dict with elements 'headers', 'page', 'file', 'status', 'time' (elapsed request time in ns_time format), and 'modified'.
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- util_http_json_encoding, postman_echo
util::http::post (public)
util::http::post [ -url url ] [ -files files ] [ -base64 ] \ [ -formvars formvars ] [ -formvars_list formvars_list ] \ [ -body body ] [ -max_body_size max_body_size ] \ [ -headers headers ] [ -timeout timeout ] [ -max_depth max_depth ] \ [ -force_ssl ] [ -multipart ] [ -gzip_request ] [ -gzip_response ] \ [ -post_redirect ] [ -spool ] [ -preference preference ]
Implement client-side HTTP POST request.
- Switches:
- -url (optional)
- -files (optional)
- File upload can be specified using actual files on the filesystem or binary strings of data using the '-files' parameter. '-files' must be a dict (flat list of key value pairs). Keys of '-files' parameter are: - data: binary data to be sent. If set, has precedence on 'file' key - file: path for the actual file on filesystem - filename: name the form will receive for this file - fieldname: name the field this file will be sent as - mime_type: mime_type the form will receive for this file If 'filename' is missing and an actual file is being sent, it will be set as the same name as the file. If 'mime_type' is missing, it will be guessed from 'filename'. If result is */* or an empty mime_type, 'application/octet-stream' will be used If '-base64' flag is set, files will be base64 encoded (useful for some kind of form).
- -base64 (optional, boolean)
- -formvars (optional)
- These are additional form variables already in URLencoded format, for instance, by using 'export_vars -url'. They will be translated for the proper type of form (URLencoded or multipart) depending on the presence of 'files' or the 'multipart' flag. Variables specified this way will be appended to those supplied via the 'formvars_list' parameter.
- -formvars_list (optional)
- These are additional form variables in list format. They will be translated for the proper type of form (URLencoded or multipart) depending on the presence of files or the multipart flag. The payload will be made by the sum of data coming from 'formvars', 'formvars_list' and 'files' arguments. Default behavior is to build payload as an 'application/x-www-form-urlencoded' payload if no files are specified, and 'multipart/form-data' otherwise. If '-multipart' flag is set, format will be forced to multipart.
- -body (optional)
- is the payload for the request and will be passed as is (useful for many purposes, such as webDav). A convenient way to specify form variables through this argument is passing a string obtained by 'export_vars -url'.
- -max_body_size (optional, defaults to
"25000000"
)- this value in number of characters will tell how big can the whole body payload get before we start spooling its content to a file. This is important in case of big file uploads, when keeping the entire request in memory is just not feasible. The handling of the spooling is taken care of in the API. This value takes into account also the encoding required by the content type, so its value could not reflect the exact length of body's string representation.
- -headers (optional)
- specifies an ns_set of extra headers to send to the server when doing the request. Some options exist that allow one to avoid the need to specify headers manually, but headers will always take precedence over options.
- -timeout (optional, defaults to
"30"
)- Timeout in seconds. The value can be an integer, a floating point number or an ns_time value.
- -max_depth (optional, defaults to
"10"
)- is the maximum number of redirects the proc is allowed to follow. A value of 0 disables redirection. When max depth for redirection has been reached, proc will return response from the last page we were redirected to. This is important if redirection response contains data such as cookies we need to obtain anyway. Be aware that when following redirects, unless it is a code 303 redirect, url and POST urlencoded variables will be sent again to the redirected host. Multipart variables won't be sent again. Sending to the redirected host can be dangerous, if such host is not trusted or uses a lower level of security.
- -force_ssl (optional, boolean)
- specifies whether we want to use SSL despite the url being in http:// form. Default behavior is to use SSL on https:// URLs only.
- -multipart (optional, boolean)
- -gzip_request (optional, boolean)
- informs the server that we are sending data in gzip format. Data will be automatically compressed. Notice that not all servers can treat gzipped requests properly, and in such cases response will likely be an error.
- -gzip_response (optional, boolean)
- informs the server that we are capable of receiving gzipped responses. If server complies to our indication, the result will be automatically decompressed.
- -post_redirect (optional, boolean)
- decides what happens when we are POSTing and server replies with 301, 302 or 303 redirects. RFC 2616/10.3.2 states that method should not change when 301 or 302 are returned, and that GET should be used on a 303 response, but most HTTP clients fail in respecting this and switch to a GET request independently. This option forces this kinds of redirect to conserve their original method.
- -spool (optional, boolean)
- enables file spooling of the request on the file specified. It is useful when we expect large responses from the server. The result is spooled to a temporary file, the name is returned in the file component of the result.
- -preference (optional, defaults to
"native curl"
)- decides which available implementation prefer in respective order. Choice is between 'native', based on ns_ api, available for NaviServer only and giving the best performances and 'curl', which wraps the command line utility (available on every system with curl installed).
- Returns:
- the data as dict with elements 'headers', 'page', 'file', 'status', 'time' (elapsed request time in ns_time format), and 'modified'.
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- util_http_json_encoding, postman_echo, util_http_post_vars
util::http::post_payload (public)
util::http::post_payload [ -url url ] [ -files files ] [ -base64 ] \ [ -formvars formvars ] [ -formvars_list formvars_list ] \ [ -body body ] [ -max_body_size max_body_size ] \ [ -headers headers ] [ -multipart ]
Build the payload for a POST request
- Switches:
- -url (optional)
- does not affect the payload directly, but is used to check that variables specified via the URL do not conflict with those coming from other parameters. In such case, an error is returned.
- -files (optional)
- File upload can be specified using actual files on the filesystem or binary strings of data using the '-files' parameter. '-files' must be a dict (flat list of key value pairs). Keys of '-files' parameter are: - data: binary data to be sent. If set, has precedence on 'file' key - file: path for the actual file on filesystem - filename: name the form will receive for this file - fieldname: name the field this file will be sent as - mime_type: mime_type the form will receive for this file If 'filename' is missing and an actual file is being sent, it will be set as the same name as the file. If 'mime_type' is missing, it will be guessed from 'filename'. If result is */* or an empty mime_type, 'application/octet-stream' will be used If '-base64' flag is set, files will be base64 encoded (useful for some kind of form).
- -base64 (optional, boolean)
- -formvars (optional)
- These are additional form variables already in URLencoded format, for instance, by using 'export_vars -url'. They will be translated for the proper type of form (URLencoded or multipart) depending on the presence of 'files' or the 'multipart' flag. Variables specified this way will be appended to those supplied via the 'formvars_list' parameter.
- -formvars_list (optional)
- These are additional form variables in list format. They will be translated for the proper type of form (URLencoded or multipart) depending on the presence of files or the multipart flag. The payload will be made by the sum of data coming from 'formvars', 'formvars_list' and 'files' arguments. Default behavior is to build payload as an 'application/x-www-form-urlencoded' payload if no files are specified, and 'multipart/form-data' otherwise. If '-multipart' flag is set, format will be forced to multipart.
- -body (optional)
- is the payload for the request and will be passed as is (useful for many purposes, such as webDav). A convenient way to specify form variables through this argument is passing a string obtained by 'export_vars -url'.
- -max_body_size (optional, defaults to
"25000000"
)- this value in number of characters will tell how big can the whole body payload get before we start spooling its content to a file. This is important in case of big file uploads, when keeping the entire request in memory is just not feasible. The handling of the spooling is taken care of in the API. This value takes into account also the encoding required by the content type, so its value could not reflect the exact length of body's string representation.
- -headers (optional)
- Processing the payload might set some request headers. Provide yours to either override the default behavior, or to merge your headers with those from the payload. The resulting headers will be returned in the dict.
- -multipart (optional, boolean)
- Returns:
- a dict with fields 'payload', 'payload_file' and 'headers'
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- util_http_json_encoding, postman_echo, util_http_post_vars, template_widget_file
util::http::set_cookies (public)
util::http::set_cookies -resp_headers resp_headers \ [ -headers headers ] [ -cookie_names cookie_names ] \ [ -pattern pattern ]
Extracts cookies from response headers. This is done reading every 'set-cookie' header and populating an ns_set of request headers suitable for issuing 'util::http' requests.
- Switches:
- -resp_headers (required)
- Response headers, in a list form as returned by 'util::http' API.
- -headers (optional)
- ns_set of request headers that will be populated with extracted cookies. If not specified, a new ns_set will be created. Existing cookies will be overwritten.
- -cookie_names (optional)
- Cookie names we want to retrieve. Other cookies will be ignored. If omitted together with '-pattern' proc will include every cookie.
- -pattern (optional)
- Cookies which name respects this pattern as in 'string match' will be included. If omitted together with '-cookie_names' proc will include every cookie.
- Returns:
- ns_set of headers containing received cookies
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
util::link_responding_p (public)
util::link_responding_p [ -url url ] \ [ -list_of_bad_codes list_of_bad_codes ]
- Switches:
- -url (optional)
- -list_of_bad_codes (optional, defaults to
"404"
)- Returns:
- 1 if the URL is responding (generally we think that anything other than 404 (not found) is okay).
- See Also:
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
Content File Source
ad_library { Procs for HTTP client communication @author Antonio Pisano @creation-date 2014-02-13 } #################################### ## New HTTP client implementation ## #################################### namespace eval util {} namespace eval util::http {} d_proc -public util::http::set_cookies { -resp_headers:required {-headers ""} {-cookie_names ""} {-pattern ""} } { Extracts cookies from response headers. This is done reading every 'set-cookie' header and populating an ns_set of request headers suitable for issuing 'util::http' requests. @param resp_headers Response headers, in a list form as returned by 'util::http' API. @param headers ns_set of request headers that will be populated with extracted cookies. If not specified, a new ns_set will be created. Existing cookies will be overwritten. @param cookie_names Cookie names we want to retrieve. Other cookies will be ignored. If omitted together with '-pattern' proc will include every cookie. @param pattern Cookies which name respects this pattern as in 'string match' will be included. If omitted together with '-cookie_names' proc will include every cookie. @return ns_set of headers containing received cookies } { if {$headers eq ""} { set headers [ns_set create headers] } set cookies [list] foreach {name value} $resp_headers { # get only set-cookie headers, ignoring case set name [string tolower $name] if {$name ne "set-cookie"} continue # keep only relevant part of the cookie set cookie [lindex [split $value ";"] 0] set cookie_name [lindex [split $cookie "="] 0] if {($cookie_names eq "" || $cookie_name in $cookie_names) && ($pattern eq "" || [string match $pattern $cookie_name])} { lappend cookies $cookie } } ns_set idelkey $headers "cookie" set cookies [join $cookies "; "] ns_set put $headers "cookie" $cookies return $headers } d_proc -public util::http::basic_auth { {-headers ""} -username:required -password:required } { Builds BASIC authentication header for an HTTP request @param headers ns_set of request headers that will be populated with auth header. If not specified, a new ns_set will be created. Existing header for BASIC authentication will be overwtitten. @param username Username for authentication @param password Password for authentication @return ns_set of headers containing authentication data } { if {$headers eq ""} { set headers [ns_set create headers] } set h "Basic [ns_base64encode ${username}:$password]" ns_set idelkey $headers "Authorization" ns_set put $headers "Authorization" $h return $headers } d_proc -public util::http::cookie_auth { {-headers ""} {-auth_vars ""} {-auth_url ""} {-auth_form ""} {-auth_cookies ""} {-preference {native curl}} } { This proc implements the generic pattern for cookie-based authentication: user logs in a webpage providing username, password and optionally other information in a form, page replies generating one or more authentication cookies by which user will be recognized on subsequent interaction with the server. By this method was possible, for example, to authenticate on a remote OpenACS installation providing 'email' and 'password' as credentials to the /register/ page, and using 'ad_session_id' and 'ad_user_login' as 'auth_cookies'. This proc is a bit hacky and is nowadays not clear if it makes sense anymore... This proc takes care to submit to the login form also every other formfield on the login page. This is important because this (often hidden) formfields can contain tokens necessary for the authentication process. @param headers ns_set of request headers that will be populated with auth headers. If not specified, a new ns_set will be created. Existing cookies will be overwritten. @param auth_vars Variables issued to the login page in 'export_vars -url' form. @param auth_url Login url @param auth_cookies Cookies we should look for in the response from the login page to obtain authentication data. If not specified, this will refer to every cookie received into 'set-cookie' response headers. @param auth_form Form to put our data into. If not specified, there must be only one form on the login page, otherwise proc will throw an error. @return ns_set of headers containing authentication data } { if {$headers eq ""} { set headers [ns_set create headers] } # Normalize url. Slashes at the end can make the same url don't # look the same for the server, if we retrieve the same url from # the 'action' attribute of the form. set auth_url [string trimright $auth_url "/"] set base_url [split $auth_url "/"] set base_url [lindex $base_url 0]//[lindex $base_url 2] # Call login url to obtain login form set r [util::http::get -url $auth_url -preference $preference] # Get cookies from response util::http::set_cookies \ -resp_headers [dict get $r headers] \ -headers $headers \ -cookie_names $auth_cookies # Obtain and export form vars not provided explicitly set form [util::html::get_forms -html [dict get $r page]] set form [util::html::get_form -forms $form -id $auth_form] set a [dict get $form attributes] # Action could be different from original login url I take that # from form attributes. if {[dict exists $a action]} { set auth_url ${base_url}[dict get $a action] set auth_url [string trimright $auth_url "/"] } set formvars [util::html::get_form_vars -form $form] set formvars [export_vars -exclude $auth_vars $formvars] # Export vars provided explicitly in caller scope set auth_vars [uplevel [list export_vars -url $auth_vars]] # Join form vars with our vars set formvars [join [list $formvars $auth_vars] "&"] # Call login url with authentication parameters. Just retrieve the # first response, as it is common for login pages to redirect # somewhere, but we just need to steal the cookies. set r [util::http::post \ -url $auth_url \ -body $formvars \ -headers $headers \ -max_depth 0 \ -preference $preference] # Get cookies from response util::http::set_cookies \ -resp_headers [dict get $r headers] \ -headers $headers \ -cookie_names $auth_cookies return $headers } d_proc -public util::http::available { {-preference {native curl}} args } { Return the preferred HTTP API among those available based on preference and OpenACS installation capabilities. @param preference decides which available implementation prefer in respective order. Choice is between 'native', based on ns_http api, available for NaviServer only and giving the best performances and 'curl', which wraps the command line utility (available on every system with curl installed). } { if {[llength $args] > 0} { ns_log warning "util::http::available: possible deprecated arguments specified ($args)" } set preferred [lindex $preference 0] if {$preferred eq "native" && [acs::icanuse "ns_http results dict"]} { return "native" } elseif {[util::which curl] ne ""} { return "curl" } else { return "" } } # ## Procs common to both implementations # d_proc -private util::http::get_channel_settings { content_type } { Helper proc to get encoding based on content_type (From xotcl/tcl/http-client-procs) } { # In the following, I realize an IANA/MIME charset resolution # scheme which is compliant with RFC 3023 which deals with # treating XML media types properly. # # see http://tools.ietf.org/html/rfc3023 # # This makes the use of [ns_encodingfortype] obsolete as this # helper proc does not consider RFC 3023 at all. In the future, # RFC 3023 support should enter a revised [ns_encodingfortype], # for now, we fork. # # The mappings between Tcl encoding names (as shown by [encoding # names]) and IANA/MIME charset names (i.e., names and aliases in # the sense of http://www.iana.org/assignments/character-sets) is # provided by ... # # i. a static, built-in correspondence map: see nsd/encoding.c # ii. an extensible correspondence map (i.e., the ns/charsets # section in config.tcl). # # For mapping charset to encoding names, I use # [ns_encodingforcharset]. # # Note, there are also alternatives for resolving IANA/MIME # charset names to Tcl encoding names, however, they all have # issues (non-extensibility from standard configuration sites, # incompleteness, redundant thread-local storing, scripted # implementation): # 1. tcllib/mime package: ::mime::reversemapencoding() # 2. tdom: tDOM::IANAEncoding2TclEncoding(); see lib/tdom.tcl # # RFC 3023 support (at least in my reading) demands the following # resolution order (see also Section 3.6 in RFC 3023), when # applied along with RFC 2616 (see especially Section 3.7.1 in RFC 2616) # # (A) Check for the "charset" parameter on certain (!) media types: # an explicitly stated, yet optional "charset" parameter is # permitted for all text/* media subtypes (RFC 2616) and selected # the XML media type classes listed by RFC 3023 (beyond the text/* # media type; e.g. "application/xml*", "*/*+xml", etc.). # # (B) If the "charset" is omitted, certain default values apply (!): # # (B.1) RFC 3023 text/* registrations default to us-ascii (!), # and not iso-8859-1 (overruling RFC 2616). # # (B.2) RFC 3023 application/* and non-text "+xml" registrations # are to be left untreated (in our context, no encoding # filtering is to be applied -> "binary") # # (B.3) RFC 2616 text/* registration (if not covered by B.1) # default to iso-8859-1 # # (B.4) RFC 4627 json defaults to utf-8 # # (C) If neither A or B apply (e.g., because an invalid charset # name was given to the charset parameter), we default to # "binary". This corresponds to the behavior of # [ns_encodingfortype]. Also note that the RFCs 3023 and 2616 do # not state any procedure when "invalid" charsets etc. are # identified. I assume, RFC-compliant clients have to ignore them # which means keep the channel in- and output unfiltered (encoding # = "binary"). This requires the client of the *HttpRequest* to # treat the data accordingly. # set enc "" if {[regexp {^text/.*$|^.*/json.*$|^.*/xml.*$|^.*\+xml.*$} $content_type]} { # Case (A): Check for an explicitly provided charset parameter if {[regexp {;\s*charset\s*=([^;]*)} $content_type _ charset]} { set enc [ns_encodingforcharset [string trim $charset]] } # Case (B.1) if {$enc eq "" && [regexp {^text/xml.*$|text/.*\+xml.*$} $content_type]} { set enc [ns_encodingforcharset us-ascii] } # Case (B.3) if {$enc eq "" && [string match "text/*" $content_type]} { set enc [ns_encodingforcharset iso-8859-1] } # Case (B.4) if {$enc eq "" && $content_type eq "application/json"} { set enc [ns_encodingforcharset utf-8] } } # Cases (C) and (B.2) are covered by the [expr] below. set enc [expr {$enc eq "" ? "binary" : $enc}] return $enc } d_proc util::http::get { -url {-headers ""} {-timeout 30} {-max_depth 10} -force_ssl:boolean -gzip_response:boolean -spool:boolean {-preference {native curl}} } { Issue an HTTP GET request to 'url'. @param headers specifies an ns_set of extra headers to send to the server when doing the request. Some options exist that allow one to avoid the need to specify headers manually, but headers will always take precedence over options. @param gzip_response informs the server that we are capable of receiving gzipped responses. If server complies to our indication, the result will be automatically decompressed. @param force_ssl specifies whether we want to use SSL despite the url being in http:// form. Default behavior is to use SSL on https:// URLs only. @param spool enables file spooling of the request on the file specified. It is useful when we expect large responses from the server. The result is spooled to a temporary file, the name is returned in the file component of the result. @param preference decides which available implementation prefer in respective order. Choice is between 'native', based on ns_ api, available for NaviServer only and giving the best performances and 'curl', which wraps the command line utility (available on every system with curl installed). @param timeout Timeout in seconds. The value can be an integer, a floating point number or an ns_time value. @return the data as dict with elements 'headers', 'page', 'file', 'status', 'time' (elapsed request time in ns_time format), and 'modified'. } { return [util::http::request \ -url $url \ -method GET \ -headers $headers \ -timeout $timeout \ -max_depth $max_depth \ -preference $preference \ -force_ssl=$force_ssl_p \ -gzip_response=$gzip_response_p \ -spool=$spool_p] } d_proc util::http::post_payload { {-url ""} {-files {}} -base64:boolean {-formvars ""} {-formvars_list ""} {-body ""} {-max_body_size 25000000} {-headers ""} -multipart:boolean } { Build the payload for a POST request @param url does not affect the payload directly, but is used to check that variables specified via the URL do not conflict with those coming from other parameters. In such case, an error is returned. @param body is the payload for the request and will be passed as is (useful for many purposes, such as webDav). A convenient way to specify form variables through this argument is passing a string obtained by 'export_vars -url'. @param max_body_size this value in number of characters will tell how big can the whole body payload get before we start spooling its content to a file. This is important in case of big file uploads, when keeping the entire request in memory is just not feasible. The handling of the spooling is taken care of in the API. This value takes into account also the encoding required by the content type, so its value could not reflect the exact length of body's string representation. @param files File upload can be specified using actual files on the filesystem or binary strings of data using the '-files' parameter. '-files' must be a dict (flat list of key value pairs). Keys of '-files' parameter are: - data: binary data to be sent. If set, has precedence on 'file' key - file: path for the actual file on filesystem - filename: name the form will receive for this file - fieldname: name the field this file will be sent as - mime_type: mime_type the form will receive for this file If 'filename' is missing and an actual file is being sent, it will be set as the same name as the file. If 'mime_type' is missing, it will be guessed from 'filename'. If result is */* or an empty mime_type, 'application/octet-stream' will be used If '-base64' flag is set, files will be base64 encoded (useful for some kind of form). @param formvars These are additional form variables already in URLencoded format, for instance, by using 'export_vars -url'. They will be translated for the proper type of form (URLencoded or multipart) depending on the presence of 'files' or the 'multipart' flag. Variables specified this way will be appended to those supplied via the 'formvars_list' parameter. @param formvars_list These are additional form variables in list format. They will be translated for the proper type of form (URLencoded or multipart) depending on the presence of files or the multipart flag. The payload will be made by the sum of data coming from 'formvars', 'formvars_list' and 'files' arguments. Default behavior is to build payload as an 'application/x-www-form-urlencoded' payload if no files are specified, and 'multipart/form-data' otherwise. If '-multipart' flag is set, format will be forced to multipart. @param headers Processing the payload might set some request headers. Provide yours to either override the default behavior, or to merge your headers with those from the payload. The resulting headers will be returned in the dict. @return a dict with fields 'payload', 'payload_file' and 'headers' } { set this_proc [lindex [info level 0] 0] # Retrieve variables sent by the URL... set parsed [ns_parseurl $url] if {[dict exists $parsed query]} { array set urlvars [ns_set array [ns_parsequery [dict get $parsed query]]] } if {[llength $formvars_list] % 2 == 1} { error "'formvars_list' must have an even number of elements" } if {$formvars ne ""} { foreach {key val} [ns_set array [ns_parsequery $formvars]] { lappend formvars_list $key $val } } # Check whether we don't have multiple variable definition in url # and payload. foreach {key value} $formvars_list { if {[info exists urlvars($key)]} { return -code error "${this_proc}: Variable '$key' already specified as url variable" } } if {$headers eq ""} { set headers [ns_set create headers] } set req_content_type [ns_set iget $headers "content-type"] set payload {} set payload_file {} set payload_file_fd {} # Request will be multipart if required by the flag, if we have # files or if set up manually by the headers if {$multipart_p || [llength $files] != 0 || [string match -nocase "*multipart/form-data*" $req_content_type]} { # delete every manually set content-type header... while {[ns_set ifind $headers "Content-type"] >= 0} { ns_set idelkey $headers "Content-type" } # ...replace it with our own... set boundary [ns_sha1 [list [clock clicks -milliseconds] [clock seconds]]] set req_content_type "multipart/form-data; boundary=$boundary" ns_set put $headers "Content-type" $req_content_type # ...and get the proper encoding for the content. set enc [util::http::get_channel_settings $req_content_type] # Transform files into binaries foreach f $files { if {![dict exists $f data]} { if {![dict exists $f file]} { return -code error "${this_proc}: No file specified" } set file [dict get $f file] if {![ad_file exists $file]} { return -code error "${this_proc}: Error reading file: $file not found" } if {![ad_file readable $file]} { return -code error "${this_proc}: Error reading file: $file permission denied" } dict set f filename [expr {[dict exists $f filename] ? [dict get $f filename] : [ad_file tail $file]}] } # Filename and fieldname must be in the file dict at this # point foreach key {filename fieldname} { if {![dict exists $f $key]} { return -code error "${this_proc}: '$key' missing for file POST" } set $key [dict get $f $key] } # Check that we don't already have this var specified in # the url if {[info exists urlvars($fieldname)]} { return -code error "${this_proc}: file field '$fieldname' already specified as url variable" } # Track form variables sent as files set filevars($fieldname) 1 if {![dict exists $f mime_type]} { set mime_type [ns_guesstype $filename] if {$mime_type in {"*/*" ""}} { set mime_type "application/octet-stream" } } else { set mime_type [dict get $f mime_type] } set transfer_encoding [expr {$base64_p ? "base64" : "binary"}] set content [list --$boundary \ \r\n \ "Content-Disposition: form-data; " \ "name=\"$fieldname\"; filename=\"$filename\"" \ \r\n \ "Content-Type: $mime_type" \ \r\n \ "Content-transfer-encoding: $transfer_encoding" \ \r\n \ \r\n] set app [append_to_payload \ -content [join $content ""] \ $enc \ $max_body_size \ $payload \ $payload_file \ $payload_file_fd] lassign $app payload payload_file payload_file_fd if {[dict exists $f data]} { set app [append_to_payload \ -content [dict get $f data] \ $enc \ $max_body_size \ $payload \ $payload_file \ $payload_file_fd] } else { set app [append_to_payload \ -file $file \ $enc \ $max_body_size \ $payload \ $payload_file \ $payload_file_fd] } lassign $app payload payload_file payload_file_fd set app [append_to_payload \ -content \r\n \ $enc \ $max_body_size \ $payload \ $payload_file \ $payload_file_fd] lassign $app payload payload_file payload_file_fd } # Translate urlencoded vars into multipart variables foreach {key val} $formvars_list { if {[info exists filevars($key)]} { return -code error "${this_proc}: Variable '$key' already specified as file variable" } set content [list --$boundary \ \r\n \ "Content-Disposition: form-data; name=\"$key\"" \ \r\n \ \r\n \ $val \ \r\n] set app [append_to_payload \ -content [join $content ""] \ $enc \ $max_body_size \ $payload \ $payload_file \ $payload_file_fd] lassign $app payload payload_file payload_file_fd } set content "--$boundary--\r\n" set app [append_to_payload \ -content $content \ $enc \ $max_body_size \ $payload \ $payload_file \ $payload_file_fd] lassign $app payload payload_file payload_file_fd } else { # If people specified a content type we won't overwrite it, # otherwise this will be a 'application/x-www-form-urlencoded' # payload if {$req_content_type eq ""} { set req_content_type "application/x-www-form-urlencoded" ns_set put $headers "Content-type" $req_content_type } set enc [util::http::get_channel_settings $req_content_type] set payload {} foreach {key val} $formvars_list { lappend payload [ad_urlencode_query $key]=[ad_urlencode_query $val] } set payload [join $payload &] } # Body will be appended as is to the payload set app [append_to_payload \ -content $body \ $enc \ $max_body_size \ $payload \ $payload_file \ $payload_file_fd] lassign $app payload payload_file payload_file_fd if {$payload_file_fd ne ""} { close $payload_file_fd } return [list \ payload $payload \ payload_file $payload_file \ headers $headers] } d_proc util::http::post { -url {-files {}} -base64:boolean {-formvars ""} {-formvars_list ""} {-body ""} {-max_body_size 25000000} {-headers ""} {-timeout 30} {-max_depth 10} -force_ssl:boolean -multipart:boolean -gzip_request:boolean -gzip_response:boolean -post_redirect:boolean -spool:boolean {-preference {native curl}} } { Implement client-side HTTP POST request. @param body is the payload for the request and will be passed as is (useful for many purposes, such as webDav). A convenient way to specify form variables through this argument is passing a string obtained by 'export_vars -url'. @param max_body_size this value in number of characters will tell how big can the whole body payload get before we start spooling its content to a file. This is important in case of big file uploads, when keeping the entire request in memory is just not feasible. The handling of the spooling is taken care of in the API. This value takes into account also the encoding required by the content type, so its value could not reflect the exact length of body's string representation. @param files File upload can be specified using actual files on the filesystem or binary strings of data using the '-files' parameter. '-files' must be a dict (flat list of key value pairs). Keys of '-files' parameter are: - data: binary data to be sent. If set, has precedence on 'file' key - file: path for the actual file on filesystem - filename: name the form will receive for this file - fieldname: name the field this file will be sent as - mime_type: mime_type the form will receive for this file If 'filename' is missing and an actual file is being sent, it will be set as the same name as the file. If 'mime_type' is missing, it will be guessed from 'filename'. If result is */* or an empty mime_type, 'application/octet-stream' will be used If '-base64' flag is set, files will be base64 encoded (useful for some kind of form). @param formvars These are additional form variables already in URLencoded format, for instance, by using 'export_vars -url'. They will be translated for the proper type of form (URLencoded or multipart) depending on the presence of 'files' or the 'multipart' flag. Variables specified this way will be appended to those supplied via the 'formvars_list' parameter. @param formvars_list These are additional form variables in list format. They will be translated for the proper type of form (URLencoded or multipart) depending on the presence of files or the multipart flag. The payload will be made by the sum of data coming from 'formvars', 'formvars_list' and 'files' arguments. Default behavior is to build payload as an 'application/x-www-form-urlencoded' payload if no files are specified, and 'multipart/form-data' otherwise. If '-multipart' flag is set, format will be forced to multipart. @param headers specifies an ns_set of extra headers to send to the server when doing the request. Some options exist that allow one to avoid the need to specify headers manually, but headers will always take precedence over options. @param gzip_request informs the server that we are sending data in gzip format. Data will be automatically compressed. Notice that not all servers can treat gzipped requests properly, and in such cases response will likely be an error. @param gzip_response informs the server that we are capable of receiving gzipped responses. If server complies to our indication, the result will be automatically decompressed. @param force_ssl specifies whether we want to use SSL despite the url being in http:// form. Default behavior is to use SSL on https:// URLs only. @param spool enables file spooling of the request on the file specified. It is useful when we expect large responses from the server. The result is spooled to a temporary file, the name is returned in the file component of the result. @param post_redirect decides what happens when we are POSTing and server replies with 301, 302 or 303 redirects. RFC 2616/10.3.2 states that method should not change when 301 or 302 are returned, and that GET should be used on a 303 response, but most HTTP clients fail in respecting this and switch to a GET request independently. This option forces this kinds of redirect to conserve their original method. @param max_depth is the maximum number of redirects the proc is allowed to follow. A value of 0 disables redirection. When max depth for redirection has been reached, proc will return response from the last page we were redirected to. This is important if redirection response contains data such as cookies we need to obtain anyway. Be aware that when following redirects, unless it is a code 303 redirect, url and POST urlencoded variables will be sent again to the redirected host. Multipart variables won't be sent again. Sending to the redirected host can be dangerous, if such host is not trusted or uses a lower level of security. @param preference decides which available implementation prefer in respective order. Choice is between 'native', based on ns_ api, available for NaviServer only and giving the best performances and 'curl', which wraps the command line utility (available on every system with curl installed). @param timeout Timeout in seconds. The value can be an integer, a floating point number or an ns_time value. @return the data as dict with elements 'headers', 'page', 'file', 'status', 'time' (elapsed request time in ns_time format), and 'modified'. } { set payload_data [util::http::post_payload \ -url $url \ -files $files \ -base64=$base64_p \ -formvars $formvars \ -formvars_list $formvars_list \ -body $body \ -max_body_size $max_body_size \ -headers $headers \ -multipart=$multipart_p] set payload [dict get $payload_data payload] set payload_file [dict get $payload_data payload_file] set headers [dict get $payload_data headers] return [util::http::request \ -method POST \ -body $payload \ -body_file $payload_file \ -delete_body_file \ -headers $headers \ -url $url \ -timeout $timeout \ -max_depth $max_depth \ -preference $preference \ -force_ssl=$force_ssl_p \ -gzip_request=$gzip_request_p \ -gzip_response=$gzip_response_p \ -post_redirect=$post_redirect_p \ -spool=$spool_p] } d_proc -private util::http::append_to_payload { {-content ""} {-file ""} -base64:boolean encoding max_size payload spool_file wfd } { Appends content to a POST payload making sure this doesn't exceed given max size. When this happens, proc creates a spool file and writes there the content. @return a list in the format {total_payload spooling_file spooling_file_handle} } { set encode_p [expr {$encoding ni [list "binary" [encoding system]]}] set payload_size [string length $payload] # Get content size if {$file eq ""} { set content_size [string length $content] } else { set content_size [ad_file size $file] } # Content size seems ok. Now try applying encoding if {$spool_file eq "" && $payload_size + $content_size <= $max_size} { if {$file ne ""} { set rfd [open $file r] fconfigure $rfd -translation binary set content [read $rfd] close $rfd } if {$base64_p} { set content [ns_base64encode $content] } if {$encode_p} { set content [encoding convertto $encoding $content] } set content_size [string length $content] } if {$spool_file eq "" && $payload_size + $content_size <= $max_size} { ## Payload small enough: # just append new content return [list ${payload}${content} {} {}] } ## Payload is too big: if {$spool_file eq ""} { # create the spool file set wfd [ad_opentmpfile spool_file] fconfigure $wfd -translation binary # flush currently collected payload puts -nonewline $wfd $payload # set required encoding for next content if {$encode_p} { fconfigure $wfd -encoding $encoding } } # output content to spool file if {$file ne ""} { if {$base64_p} { # TODO: it's tricky to base64 encode without slurping # the whole file (exec + pipes?) error "Base64 encoding currently supported only for in-memory file POSTing" } set rfd [open $file r] fconfigure $rfd -translation binary fconfigure $wfd -translation binary fcopy $rfd $wfd fconfigure $wfd -translation auto close $rfd } else { puts -nonewline $wfd $content } return [list {} $spool_file $wfd] } d_proc -private util::http::follow_redirects { -url -method -status -location {-body ""} {-body_file ""} -delete_body_file:boolean {-headers ""} {-timeout 30} {-depth 0} {-max_depth 10} -force_ssl:boolean -multipart:boolean -gzip_request:boolean -gzip_response:boolean -post_redirect:boolean -spool:boolean -preference {native curl} } { Follow redirects. This proc is required because we want to be able to follow a redirect until a certain depth and then stop without throwing an error. Happens at times that even a redirect page contains very important information we want to be able to reach. An example could be authentication headers. By putting redirection handling here we can force a common behavior between the two implementations, that otherwise would not be possible. @param body is the payload for the request and will be passed as is (useful for many purposes, such as webDav). A convenient way to specify form variables through this argument is passing a string obtained by 'export_vars -url'. Default behavior is to build payload as an 'application/x-www-form-urlencoded' payload if no files are specified, and 'multipart/form-data' otherwise. If '-multipart' flag is set, format will be forced to multipart. @param body_file is an alternative way to specify the payload, useful in cases such as the upload of big files by POST. If specified, will have precedence over the 'body' parameter. Content of the file won't be encoded according with the content type of the request as happen with 'body' @param delete_body_file decides whether remove body payload file once the request is over. @param headers specifies an ns_set of extra headers to send to the server when doing the request. Some options exist that allow one to avoid the need to specify headers manually, but headers will always take precedence over options. @param gzip_request informs the server that we are sending data in gzip format. Data will be automatically compressed. Notice that not all servers can treat gzipped requests properly, and in such cases response will likely be an error. @param gzip_response informs the server that we are capable of receiving gzipped responses. If server complies to our indication, the result will be automatically decompressed. @param force_ssl specifies whether we want to use SSL despite the url being in http:// form. Default behavior is to use SSL on https:// URLs only. @param spool enables file spooling of the request on the file specified. It is useful when we expect large responses from the server. The result is spooled to a temporary file, the name is returned in the file component of the result. @param post_redirect decides what happens when we are POSTing and server replies with 301, 302 or 303 redirects. RFC 2616/10.3.2 states that method should not change when 301 or 302 are returned, and that GET should be used on a 303 response, but most HTTP clients fail in respecting this and switch to a GET request independently. This option forces this kinds of redirect to conserve their original method. @param max_depth is the maximum number of redirects the proc is allowed to follow. A value of 0 disables redirection. When max depth for redirection has been reached, proc will return response from the last page we were redirected to. This is important if redirection response contains data such as cookies we need to obtain anyway. Be aware that when following redirects, unless it is a code 303 redirect, url and POST urlencoded variables will be sent again to the redirected host. Multipart variables won't be sent again. Sending to the redirected host can be dangerous, if such host is not trusted or uses a lower level of security. @param preference decides which available implementation prefer in respective order. Choice is between 'native', based on ns_ api, available for NaviServer only and giving the best performances and 'curl', which wraps the command line utility (available on every system with curl installed). @param timeout Timeout in seconds. The value can be an integer, a floating point number or an ns_time value. @return the data as dict with elements 'headers', 'page', 'file', 'status', 'time' (elapsed request time in ns_time format), and 'modified' from the last followed redirect, or an empty string if request was not a redirection. } { ## Redirection management ## # Don't follow if page was not modified or this was not a proper redirect: # not the right status code, missing location. if {$status == 304 || ![string match "3??" $status] || $location eq ""} { return "" } # Other kinds of redirection... # Decide by which method follow the redirect if {$method eq "POST"} { if {$status in {301 302 303} && !$post_redirect_p} { set method "GET" } } # # A redirect from HTTP might point to HTTPS, which in turn # might not be configured. So we have to go through # util::http::request again. # set this_proc ::util::http::request set urlvars [list] # ...retrieve redirect location variables... set locvars [lindex [split $location ?] 1] if {$locvars ne ""} { lappend urlvars $locvars } lappend urlvars [lindex [split $url ?] 1] # If we have POST payload and we are following by GET, put the payload into url vars. if {$method eq "GET" && $body ne ""} { set req_content_type [ns_set iget $headers "content-type"] set multipart_p [string match -nocase "*multipart/form-data*" $req_content_type] # I decided to don't translate into urlvars a multipart payload. # This makes sense if we think that in a multipart payload we have # some information, such as mime_type, which cannot be put into url. # Receiving a GET redirect after a POST is very common, so I won't throw an error if {!$multipart_p} { if {$gzip_request_p} { set body [zlib gunzip $body] } lappend urlvars $body } } # Unite all variables into location URL set urlvars [join $urlvars &] if {$urlvars ne ""} { set location ${location}?${urlvars} } # # According to # https://www.rfc-editor.org/rfc/rfc7231#section-7.1.2, the # location header may return a relative URL as well. # set location [ns_absoluteurl $location $url] if {$method eq "GET"} { return [$this_proc \ -method GET \ -url $location \ -headers $headers \ -timeout $timeout \ -depth $depth \ -max_depth $max_depth \ -force_ssl=$force_ssl_p \ -gzip_response=$gzip_response_p \ -post_redirect=$post_redirect_p \ -spool=$spool_p \ -preference $preference] } else { return [$this_proc \ -method POST \ -url $location \ -body $body \ -body_file $body_file \ -delete_body_file=$delete_body_file_p \ -headers $headers \ -timeout $timeout \ -depth $depth \ -max_depth $max_depth \ -force_ssl=$force_ssl_p \ -gzip_request=$gzip_request_p \ -gzip_response=$gzip_response_p \ -post_redirect=$post_redirect_p \ -spool=$spool_p \ -preference $preference] } } d_proc -private util::http::request { -url {-method GET} {-headers ""} {-body ""} {-body_file ""} -delete_body_file:boolean {-timeout 30} {-depth 0} {-max_depth 10} -force_ssl:boolean -gzip_request:boolean -gzip_response:boolean -post_redirect:boolean -spool:boolean {-preference {native curl}} } { Issue an HTTP request either GET or POST to the url specified. @param headers specifies an ns_set of extra headers to send to the server when doing the request. Some options exist that allow one to avoid the need to specify headers manually, but headers will always take precedence over options. @param body is the payload for the request and will be passed as is (useful for many purposes, such as webDav). A convenient way to specify form variables for POST payloads through this argument is passing a string obtained by 'export_vars -url'. @param body_file is an alternative way to specify the payload, useful in cases such as the upload of big files by POST. If specified, will have precedence over the 'body' parameter. Content of the file won't be encoded according with the content type of the request as happen with 'body' @param delete_body_file decides whether remove body payload file once the request is over. @param gzip_request informs the server that we are sending data in gzip format. Data will be automatically compressed. Notice that not all servers can treat gzipped requests properly, and in such cases response will likely be an error. @param gzip_response informs the server that we are capable of receiving gzipped responses. If server complies to our indication, the result will be automatically decompressed. @param force_ssl specifies whether we want to use SSL despite the url being in http:// form. Default behavior is to use SSL on https:// URLs only. @param spool enables file spooling of the request on the file specified. It is useful when we expect large responses from the server. The result is spooled to a temporary file, the name is returned in the file component of the result. @param post_redirect decides what happens when we are POSTing and server replies with 301, 302 or 303 redirects. RFC 2616/10.3.2 states that method should not change when 301 or 302 are returned, and that GET should be used on a 303 response, but most HTTP clients fail in respecting this and switch to a GET request independently. This option forces this kinds of redirect to conserve their original method. Notice that, as from RFC, a 303 redirect won't send again any data to the server, as specification says we can assume variables to have been received. @param max_depth is the maximum number of redirects the proc is allowed to follow. A value of 0 disables redirection. When max depth for redirection has been reached, proc will return response from the last page we were redirected to. This is important if redirection response contains data such as cookies we need to obtain anyway. Be aware that when following redirects, unless it is a code 303 redirect, url and POST urlencoded variables will be sent again to the redirected host. Multipart variables won't be sent again. Sending to the redirected host can be dangerous, if such host is not trusted or uses a lower level of security. @param preference decides which available implementation prefer in respective order. Choice is between 'native', based on ns_ api, available for NaviServer only and giving the best performances and 'curl', which wraps the command line utility (available on every system with curl installed). @param timeout Timeout in seconds. The value can be an integer, a floating point number or an ns_time value. @return the data as dict with elements 'headers', 'page', 'file', 'status', 'time' (elapsed request time in ns_time format), and 'modified'. } { set this_proc [lindex [info level 0] 0] set impl [util::http::available -preference $preference] if {$impl eq ""} { return -code error "${this_proc}: HTTP client functionalities for this protocol are not available with current system configuration." } return [util::http::${impl}::request \ -method $method \ -body $body \ -body_file $body_file \ -delete_body_file=$delete_body_file_p \ -headers $headers \ -url $url \ -timeout $timeout \ -depth $depth \ -max_depth $max_depth \ -force_ssl=$force_ssl_p \ -gzip_request=$gzip_request_p \ -gzip_response=$gzip_response_p \ -post_redirect=$post_redirect_p \ -spool=$spool_p] } # ## Native NaviServer implementation # namespace eval util::http::native {} d_proc -private util::http::native::request { -url {-method GET} {-headers ""} {-body ""} {-body_file ""} -delete_body_file:boolean {-timeout 30} {-depth 0} {-max_depth 10} -force_ssl:boolean -gzip_request:boolean -gzip_response:boolean -post_redirect:boolean -spool:boolean } { Issue an HTTP request either GET or POST to the url specified. This is the native implementation based on NaviServer HTTP API. @param headers specifies an ns_set of extra headers to send to the server when doing the request. Some options exist that allow one to avoid the need to specify headers manually, but headers will always take precedence over options. @param body is the payload for the request and will be passed as is (useful for many purposes, such as webDav). A convenient way to specify form variables for POST payloads through this argument is passing a string obtained by 'export_vars -url'. @param body_file is an alternative way to specify the payload, useful in cases such as the upload of big files by POST. If specified, will have precedence over the 'body' parameter. Content of the file won't be encoded according with the content type of the request as happen with 'body' @param delete_body_file decides whether remove body payload file once the request is over. @param gzip_request informs the server that we are sending data in gzip format. Data will be automatically compressed. Notice that not all servers can treat gzipped requests properly, and in such cases response will likely be an error. @param gzip_response informs the server that we are capable of receiving gzipped responses. If server complies to our indication, the result will be automatically decompressed. @param force_ssl specifies whether we want to use SSL despite the url being in http:// form. Default behavior is to use SSL on https:// URLs only. @param spool enables file spooling of the request on the file specified. It is useful when we expect large responses from the server. The result is spooled to a temporary file, the name is returned in the file component of the result. @param post_redirect decides what happens when we are POSTing and server replies with 301, 302 or 303 redirects. RFC 2616/10.3.2 states that method should not change when 301 or 302 are returned, and that GET should be used on a 303 response, but most HTTP clients fail in respecting this and switch to a GET request independently. This option forces this kinds of redirect to conserve their original method. Notice that, as from RFC, a 303 redirect won't send again any data to the server, as specification says we can assume variables to have been received. @param max_depth is the maximum number of redirects the proc is allowed to follow. A value of 0 disables redirection. When max depth for redirection has been reached, proc will return response from the last page we were redirected to. This is important if redirection response contains data such as cookies we need to obtain anyway. Be aware that when following redirects, unless it is a code 303 redirect, url and POST urlencoded variables will be sent again to the redirected host. Multipart variables won't be sent again. Sending to the redirected host can be dangerous, if such host is not trusted or uses a lower level of security. @param timeout Timeout in seconds. The value can be an integer, a floating point number or an ns_time value. @return the data as dict with elements 'headers', 'page', 'file', 'status', 'time' (elapsed request time in ns_time format), and 'modified'. } { set this_proc [lindex [info level 0] 0] set parsed_url [ns_parseurl $url] if {![dict exists $parsed_url proto] || [dict get $parsed_url proto] ni {"http" "https"}} { return -code error "${this_proc}: Invalid url: $url" } if {$headers eq ""} { set headers [ns_set create headers] } # Determine whether we want to gzip the request. # Servers uncapable of treating such requests will likely throw an error... set req_content_encoding [ns_set iget $headers "content-encoding"] if {$req_content_encoding ne ""} { set gzip_request_p [string match "*gzip*" $req_content_encoding] } elseif {$gzip_request_p} { ns_set put $headers "Content-Encoding" "gzip" } # See if we want the response to be gzipped by headers or options # Server can decide to ignore this and serve the encoding he desires. # I also say to server that whatever he can give me will do, in case. set req_accept_encoding [ns_set iget $headers "accept-encoding"] if {$req_accept_encoding ne ""} { set gzip_response_p [string match "*gzip*" $req_accept_encoding] } elseif {$gzip_response_p} { ns_set put $headers "Accept-Encoding" "gzip, */*" } # zlib is mandatory when requiring compression if {$gzip_request_p || $gzip_response_p} { if {[namespace which zlib] eq ""} { return -code error "${this_proc}: zlib support not enabled" } } ## Encoding of the request # Any conversion or encoding of the payload should happen only at # the first request and not on redirects if {$depth == 0} { set content_type [ns_set iget $headers "content-type"] if {$content_type eq ""} { set content_type "text/plain; charset=[ns_config ns/parameters OutputCharset iso-8859-1]" } set enc [util::http::get_channel_settings $content_type] if {$enc ni [list "binary" [encoding system]]} { set body [encoding convertto $enc $body] } if {$gzip_request_p} { set body [zlib gzip $body] } } ## Issuing of the request set cmd [list ns_http run \ -timeout $timeout \ -method $method \ -headers $headers \ -hostname [dict get $parsed_url host]] if {$body_file ne ""} { lappend cmd -body_file $body_file } elseif {$body ne ""} { lappend cmd -body $body } if {$spool_p} { lappend cmd -spoolsize 0 } lappend cmd $url #ns_log notice "NS_HTTP $cmd" set r [{*}$cmd] set resp_headers [dict get $r headers] set status [dict get $r status] set time [dict get $r time] if {[dict exists $r file]} { set spool_file [dict get $r file] set page "${this_proc}: response spooled to '$spool_file'" } else { set spool_file "" set page [dict get $r body] } # Get values from response headers, then remove them set content_type [ns_set iget $resp_headers content-type] set content_encoding [ns_set iget $resp_headers content-encoding] set location [ns_set iget $resp_headers location] set last_modified [ns_set iget $resp_headers last-modified] # Move in a list to be returned to the caller set r_headers [ns_set array $resp_headers] ns_set free $resp_headers # Redirection handling if {$depth < $max_depth} { incr depth set redirection [util::http::follow_redirects \ -url $url \ -method $method \ -status $status \ -location $location \ -body $body \ -body_file $body_file \ -delete_body_file=$delete_body_file_p \ -headers $headers \ -timeout $timeout \ -depth $depth \ -max_depth $max_depth \ -force_ssl=$force_ssl_p \ -gzip_request=$gzip_request_p \ -gzip_response=$gzip_response_p \ -post_redirect=$post_redirect_p \ -spool=$spool_p \ -preference "native"] if {$redirection ne ""} { return $redirection } } if {$delete_body_file_p} { file delete -force -- $body_file } ## Decoding of the response # Translate into proper encoding set enc [util::http::get_channel_settings $content_type] if {$enc ni [list "binary" [encoding system]]} { set page [encoding convertfrom $enc $page] } return [list \ headers $r_headers \ page $page \ file $spool_file \ status $status \ time $time \ modified $last_modified] } # ## Curl wrapper implementation # namespace eval util::http::curl {} d_proc -private util::http::curl::version_not_cached { } { Gets Curl's version number. } { set version [lindex [exec [::util::which curl] --version] 1] } d_proc -private util::http::curl::version { } { Gets Curl's version number. } { set key ::util::http::curl::version if {[info exists $key]} { return [set $key] } else { return [set $key [util::http::curl::version_not_cached]] } } ad_proc -private util::http::curl::timeout {input} { Convert the provided timeout value to a format suitable for curl. Since curl versions before 7.32.0 just accept integer, the granularity is set to seconds. On doubt, the value is rounded up. } { if {[string is integer -strict $input]} { return $input } elseif {[string is double -strict $input]} { set secs [expr {int($input)}] set secfrac [expr {$input - $secs}] if {$secfrac < 0.001} { return [expr {$secs + 1}] } return $secs } elseif {[regexp {^([0-9]+):([0-9]*)$} $input _ secs microsecs]} { if {$microsecs > 1000} { return [expr {$secs + 1}] } return $secs } return $input } d_proc -private util::http::curl::request { -url {-method GET} {-headers ""} {-body ""} {-body_file ""} -delete_body_file:boolean {-files {}} {-timeout 30} {-depth 0} {-max_depth 10} -force_ssl:boolean -gzip_request:boolean -gzip_response:boolean -post_redirect:boolean -spool:boolean } { Issue an HTTP request either GET or POST to the url specified. This is the curl wrapper implementation, used on AOLserver and when ssl native capabilities are not available. @param headers specifies an ns_set of extra headers to send to the server when doing the request. Some options exist that allow one to avoid the need to specify headers manually, but headers will always take precedence over options. @param body is the payload for the request and will be passed as is (useful for many purposes, such as webDav). A convenient way to specify form variables for POST payloads through this argument is passing a string obtained by 'export_vars -url'. @param body_file is an alternative way to specify the payload, useful in cases such as the upload of big files by POST. If specified, will have precedence over the 'body' parameter. Content of the file won't be encoded according with the content type of the request as happen with 'body' @param delete_body_file decides whether remove body payload file once the request is over. @param gzip_request informs the server that we are sending data in gzip format. Data will be automatically compressed. Notice that not all servers can treat gzipped requests properly, and in such cases response will likely be an error. @param files curl is natively capable to send files via POST requests, and exploiting it can be desirable to send very large files via POST, because no extra space will be required on the disk to prepare the request payload using this feature. Files by this parameter are couples in the form '{ form_field_name file_path_on_filesystem }' @param gzip_response informs the server that we are capable of receiving gzipped responses. If server complies to our indication, the result will be automatically decompressed. @param force_ssl is ignored when using curl HTTP client implementation and is only kept for cross compatibility. @param spool enables file spooling of the request on the file specified. It is useful when we expect large responses from the server. The result is spooled to a temporary file, the name is returned in the file component of the result. @param post_redirect decides what happens when we are POSTing and server replies with 301, 302 or 303 redirects. RFC 2616/10.3.2 states that method should not change when 301 or 302 are returned, and that GET should be used on a 303 response, but most HTTP clients fail in respecting this and switch to a GET request independently. This option forces this kinds of redirect to conserve their original method. Be aware that curl allows the POSTing of 303 requests only since version 7.26. Versions prior than this will follow 303 redirects by GET method. If following by POST is a requirement, please consider switching to the native HTTP client implementation, or update curl. @param max_depth is the maximum number of redirects the proc is allowed to follow. A value of 0 disables redirection. When max depth for redirection has been reached, proc will return response from the last page we were redirected to. This is important if redirection response contains data such as cookies we need to obtain anyway. Be aware that when following redirects, unless it is a code 303 redirect, url and POST urlencoded variables will be sent again to the redirected host. Multipart variables won't be sent again. Sending to the redirected host can be dangerous, if such host is not trusted or uses a lower level of security. @param timeout Timeout in seconds. The value can be an integer, a floating point number or an ns_time value. Since curl versions before 7.32.0 just accept integer, the granularity is set to seconds. @return the data as dict with elements 'headers', 'page', 'file', 'status', 'time' (elapsed request time in ns_time format), and 'modified'. } { set this_proc [lindex [info level 0] 0] if {![regexp "^(https|http)://*" $url]} { return -code error "${this_proc}: Invalid url: $url" } if {$headers eq ""} { set headers [ns_set create headers] } # Determine whether we want to gzip the request. # Default is no, can't know whether the server accepts it. # We could at the HTTP API level (TODO?) set req_content_encoding [ns_set iget $headers "content-encoding"] if {$req_content_encoding ne ""} { set gzip_request_p [string match "*gzip*" $req_content_encoding] } elseif {$gzip_request_p} { ns_set put $headers "Content-Encoding" "gzip" } # Curls accepts gzip by default, so if gzip response is not required # we have to ask explicitly for a plain text encoding set req_accept_encoding [ns_set iget $headers "accept-encoding"] if {$req_accept_encoding ne ""} { set gzip_response_p [string match "*gzip*" $req_accept_encoding] } elseif {!$gzip_response_p} { ns_set put $headers "Accept-Encoding" "utf-8" } # zlib is mandatory when compressing the input if {$gzip_request_p} { if {[namespace which zlib] eq ""} { return -code error "${this_proc}: zlib support not enabled" } } ## Encoding of the request # Any conversion or encoding of the payload should happen only at # the first request and not on redirects if {$depth == 0} { set content_type [ns_set iget $headers "content-type"] if {$content_type eq ""} { set content_type "text/plain; charset=[ns_config ns/parameters OutputCharset iso-8859-1]" } set enc [util::http::get_channel_settings $content_type] if {$enc ne "binary"} { set body [encoding convertto $enc $body] } if {$gzip_request_p} { set body [zlib gzip $body] } } ## Issuing of the request set cmd [list exec [::util::which curl] -s -k] if {$spool_p} { set spool_file [ad_tmpnam] lappend cmd -o $spool_file } else { set spool_file "" } if {$timeout ne ""} { lappend cmd --connect-timeout [timeout $timeout] } # Antonio Pisano 2015-09-28: curl can follow redirects # out of the box, but its behavior is to throw an error # when maximum depth has been reached. I want it to # return even a 3** page without complaining. # # Set redirection up to max_depth # if {$max_depth ne ""} { # lappend cmd -L --max-redirs $max_depth # } if {$method eq "GET"} { lappend cmd -G } # Files to be sent natively by curl by the -F option foreach f $files { if {[llength $f] != 2} { return -code error "${this_proc}: invalid -files parameter: $files" } set f [join $f "=@"] lappend cmd -F $f } # If required, we'll follow POST request redirections by GET if {!$post_redirect_p} { lappend cmd --post301 --post302 if {[apm_version_names_compare [version] "7.26"] >= 0} { lappend cmd --post303 } } # Curl can decompress response transparently if {$gzip_response_p} { lappend cmd --compressed } # Unfortunately, as we are interacting with a shell, there is no # way to escape content easily and safely. Even when body is # passed as a Tcl variable, we just write its content to a file # and let it be read by curl. set create_body_file_p [expr {$body_file eq ""}] if {$create_body_file_p} { set wfd [ad_opentmpfile body_file http-spool] fconfigure $wfd -translation binary puts -nonewline $wfd $body close $wfd } lappend cmd --data-binary "@${body_file}" # Return response code together with webpage lappend cmd -w " %\{http_code\}" # Add headers to the command line foreach {key value} [ns_set array $headers] { if {$value eq ""} { set value ";" } else { set value ": $value" } set header "${key}${value}" lappend cmd -H "$header" } # Dump response headers into a tempfile to get them set resp_headers_tmpfile [ad_tmpnam] lappend cmd -D $resp_headers_tmpfile lappend cmd $url #ns_log notice "running CURL cmd\n$cmd" set start_time [ns_time get] set response [{*}$cmd] set end_time [ns_time get] # elapsed time set time [ns_time diff $end_time $start_time] # Parse headers from dump file set resp_headers [ns_set create resp_headers] set rfd [open $resp_headers_tmpfile r] while {[gets $rfd line] >= 0} { set line [split $line ":"] set key [lindex $line 0] set value [join [lrange $line 1 end] ":"] ns_set put $resp_headers $key [string trim $value] } close $rfd # Get values from response headers, then remove them set content_type [ns_set iget $resp_headers content-type] set last_modified [ns_set iget $resp_headers last-modified] set location [ns_set iget $resp_headers location] # Move in a list to be returned to the caller set r_headers [ns_set array $resp_headers] ns_set free $resp_headers set status [string range $response end-2 end] set page [string range $response 0 end-4] # Redirection handling if {$depth < $max_depth} { incr depth set redirection [util::http::follow_redirects \ -url $url \ -method $method \ -status $status \ -location $location \ -body $body \ -body_file $body_file \ -delete_body_file=$delete_body_file_p \ -headers $headers \ -timeout $timeout \ -depth $depth \ -max_depth $max_depth \ -force_ssl=$force_ssl_p \ -gzip_request=$gzip_request_p \ -gzip_response=$gzip_response_p \ -post_redirect=$post_redirect_p \ -spool=$spool_p \ -preference "curl"] if {$redirection ne ""} { return $redirection } } if {$spool_file ne ""} { set page "${this_proc}: response spooled to '$spool_file'" } # Translate into proper encoding set enc [util::http::get_channel_settings $content_type] if {$enc ni [list "binary" [encoding system]]} { set page [encoding convertfrom $enc $page] } # Delete temp files file delete -- $resp_headers_tmpfile if {$create_body_file_p || $delete_body_file_p} { file delete -force -- $body_file } return [list \ headers $r_headers \ page $page \ file $spool_file \ status $status \ time $time \ modified $last_modified] } d_proc -public util::get_http_status { -url {-use_get_p 1} {-timeout 30} } { @return the HTTP status code, e.g., 200 for a normal response or 500 for an error, of a URL. By default this uses the GET method instead of HEAD since not all servers will respond properly to a HEAD request even when the URL is perfectly valid. Note that this means that the server may be sucking down a lot of bits that it doesn't need. } { set result [util::http::request \ -url $url \ -method [expr {$use_get_p ? "GET" : "HEAD"}] \ -timeout $timeout] return [dict get $result status] } d_proc -public util::link_responding_p { -url {-list_of_bad_codes "404"} } { @return 1 if the URL is responding (generally we think that anything other than 404 (not found) is okay). @see util::get_http_status } { if { [catch { set status [util::get_http_status -url $url] } errmsg] } { # got an error; definitely not valid return 0 } else { # we got the page but it might have been a 404 or something if { $status in $list_of_bad_codes } { return 0 } else { return 1 } } } # # Local variables: # mode: tcl # tcl-indent-level: 4 # indent-tabs-mode: nil # End: