ad_parse_html_attributes_upvar (private)
ad_parse_html_attributes_upvar [ -attribute_array attribute_array ] \ html_varname pos_varname
Defined in packages/acs-tcl/tcl/text-html-procs.tcl
Parse attributes in an HTML fragment and return them as a list of lists.
Each element of that list is either a single element, if the attribute had no value, or a two-tuple, with the first element being the name of the attribute and the second being the value. The attribute names are all converted to lowercase.
If you don't really care what happens when the same attribute is present twice, you can also use the
attribute_array
argument, and the attributes will be set there. For attributes without any value, we'll use the empty string.Example:
set html {<tag foo = bar baz greble=""hello you sucker"" foo='blah' Heres = ' something for you to = "consider" '>} set pos 5 ; # the 'f' in the first 'foo' set attribute_list [ad_parse_html_attributes_upvar -attribute_array attribute_array html pos]attribute_list
will contain the following:{foo bar} baz {greble {"hello you sucker"}} {foo blah} {heres { something for you to = "consider" }}attribute_array
will contain:attribute_array(foo)='blah' attribute_array(greble)='"hello you sucker"' attribute_array(baz)='' attribute_array(heres)=' something for you to = "consider" 'Won't alter the string passed in .. promise! We will modify pos_var. Pos_var should point to the first character inside the tag, after the tag name (we don't care if you let if there's some whitespace before the first attribute)
- Switches:
- -attribute_array (optional)
- This is an alternate way of returning the attributes, if you don't care about what happens when the same attribute name is defined twice.
- Parameters:
- html_varname (required)
- the name of the variable holding the HTML fragment. We promise that we won't change the contents of this variable.
- pos_varname (required)
- the name of the variable holding the position within the
html_varname
string from which we should start. This should point to a character inside the tag, just after the tag name, and before the first attribute. Note that we will modify this variable. When this proc is done, this variable will point to the tag-closing>
. Example: if the tag is <img src="foo">,pos_varname
should point to either the space betweenimg
andsrc
, or thes
insrc
.- Returns:
- A list of list holding the attribute names and values. Each element of that list is either a single element, if the attribute had no value, or a two-tuple, with the first element being the name of the attribute and the second being the value. The attribute names are all converted to lowercase.
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- November 10, 2000
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_html_to_text_bold, ad_html_to_text_anchor, ad_html_to_text_image, ad_html_security_check_href_allowed, ad_html_security_check_forbidden_protolcols, ad_html_text_convert
Source code: upvar $html_varname html upvar $pos_varname i if { [info exists attribute_array] } { upvar $attribute_array attribute_array_var } # This is where we're going to return the result set attributes {} # Loop over the attributes. # We maintain counter is so that we don't accidentally enter an infinite loop set count 0 while { $i < [string length $html] && [string index $html $i] ne ">" } { if { [incr count] > 3000 } { error "There appears to be a programming bug in ad_parse_html_attributes_upvar: We've entered an infinite loop. We are here: \noffset $i: [string range $html $i $i+60]" } if { [string range $html $i $i+1] eq "/>" } { # This is an XML-style tag ending: <... /> break } # This regexp matches an attribute name and an equal sign, if # present. Also eats whitespace before or after. The \A # corresponds to ^, except it matches the position we're # starting from, not the start of the string. if { ![regexp -indices -start $i {\A\s*([^\s=>]+)\s*(=?)\s*} $html match attr_name_idx equal_sign_idx] } { # # Apparently, there's no attribute name here. # Let's eat all whitespace and lonely equal signs. # regexp -indices -start $i {\A[\s=]*} $html match set i [expr { [lindex $match 1] + 1 }] } { set attr_name [string tolower [string range $html [lindex $attr_name_idx 0] [lindex $attr_name_idx 1]]] # Move past the attribute name just found set i [expr { [lindex $match 1] + 1}] # If there is an equal sign, we're expecting the next token to be a value if { [lindex $equal_sign_idx 1] - [lindex $equal_sign_idx 0] < 0 } { # No equal sign, no value lappend attributes [list $attr_name] if { [info exists attribute_array] } { set attribute_array_var($attr_name) {} } } else { # is there a single or double quote sign as the first character? switch -- [string index $html $i] { \" { set exp {\A\"([^\"]*)\"\s*} } ' { set exp {\A'([^']*)'\s*} } default { set exp {\A([^\s>]*)\s*} } } if { ![regexp -indices -start $i $exp $html match attr_value_idx] } { # No end quote. set attr_value [string range $html $i+1 end] set i [string length $html] } else { set attr_value [string range $html [lindex $attr_value_idx 0] [lindex $attr_value_idx 1]] set i [expr { [lindex $match 1] + 1}] } set attr_value [ns_unquotehtml $attr_value] lappend attributes [list $attr_name $attr_value] if { [info exists attribute_array] } { set attribute_array_var($attr_name) $attr_value } } } } return $attributesXQL Not present: Generic, PostgreSQL, Oracle