ad_parse_html_attributes

ad_parse_html_attributes_upvar (private)

 ad_parse_html_attributes_upvar [ -attribute_array attribute_array ] \
    html_varname pos_varname

Defined in packages/acs-tcl/tcl/text-html-procs.tcl

Parse attributes in an HTML fragment and return them as a list of lists.

Each element of that list is either a single element, if the attribute had no value, or a two-tuple, with the first element being the name of the attribute and the second being the value. The attribute names are all converted to lowercase.

If you don't really care what happens when the same attribute is present twice, you can also use the attribute_array argument, and the attributes will be set there. For attributes without any value, we'll use the empty string.

Example:

set html {<tag foo = bar baz greble="&quot;hello you sucker&quot;" foo='blah' Heres = '  something for   you to = "consider" '>}
    set pos 5 ; # the 'f' in the first 'foo'

    set attribute_list [ad_parse_html_attributes_upvar -attribute_array attribute_array html pos]

attribute_list will contain the following:

{foo bar} baz {greble {"hello you sucker"}} {foo blah} {heres {  something for   you to = "consider" }}

attribute_array will contain:

attribute_array(foo)='blah'
    attribute_array(greble)='"hello you sucker"'
    attribute_array(baz)=''
    attribute_array(heres)='  something for   you to = "consider" '

Won't alter the string passed in .. promise! We will modify pos_var. Pos_var should point to the first character inside the tag, after the tag name (we don't care if you let if there's some whitespace before the first attribute)

Switches:

-attribute_array (optional): This is an alternate way of returning the attributes, if you don't care about what happens when the same attribute name is defined twice.

Parameters:

html_varname (required): the name of the variable holding the HTML fragment. We promise that we won't change the contents of this variable.
pos_varname (required): the name of the variable holding the position within the html_varname string from which we should start. This should point to a character inside the tag, just after the tag name, and before the first attribute. Note that we will modify this variable. When this proc is done, this variable will point to the tag-closing >. Example: if the tag is <img src="foo">, pos_varname should point to either the space between img and src, or the s in src.

Returns:

A list of list holding the attribute names and values. Each element of that list is either a single element, if the attribute had no value, or a two-tuple, with the first element being the name of the attribute and the second being the value. The attribute names are all converted to lowercase.

Author:

Lars Pind <lars@pinds.com>

Created:

November 10, 2000

Partial Call Graph (max 5 caller/called nodes):

Testcases:

ad_html_to_text_bold, ad_html_to_text_anchor, ad_html_to_text_image, ad_html_security_check_href_allowed, ad_html_security_check_forbidden_protolcols, ad_html_text_convert

Source code:

    upvar $html_varname html
    upvar $pos_varname i
    if { [info exists attribute_array] } {
        upvar $attribute_array attribute_array_var
    }

    # This is where we're going to return the result
    set attributes {}

    # Loop over the attributes.
    # We maintain counter is so that we don't accidentally enter an infinite loop
    set count 0
    while { $i < [string length $html] && [string index $html $i] ne ">" } {
        if { [incr count] > 3000 } {
            error "There appears to be a programming bug in ad_parse_html_attributes_upvar:  We've entered an infinite loop. We are here: \noffset $i: [string range $html $i $i+60]"
        }
        if { [string range $html $i $i+1] eq "/>" } {
            # This is an XML-style tag ending: <... />
            break
        }

        # This regexp matches an attribute name and an equal sign, if
        # present.  Also eats whitespace before or after.  The \A
        # corresponds to ^, except it matches the position we're
        # starting from, not the start of the string.
        if { ![regexp -indices -start $i {\A\s*([^\s=>]+)\s*(=?)\s*} $html match attr_name_idx equal_sign_idx] } {
            #
            # Apparently, there's no attribute name here.
            # Let's eat all whitespace and lonely equal signs.
            #
            regexp -indices -start $i {\A[\s=]*} $html match
            set i [expr { [lindex $match 1] + 1 }]
        } {
            set attr_name [string tolower [string range $html [lindex $attr_name_idx 0] [lindex $attr_name_idx 1]]]

            # Move past the attribute name just found
            set i [expr { [lindex $match 1] + 1}]

            # If there is an equal sign, we're expecting the next token to be a value
            if { [lindex $equal_sign_idx 1] - [lindex $equal_sign_idx 0] < 0 } {
                # No equal sign, no value
                lappend attributes [list $attr_name]
                if { [info exists attribute_array] } {
                    set attribute_array_var($attr_name) {}
                }
            } else {

                # is there a single or double quote sign as the first character?
                switch -- [string index $html $i] {
                    \" { set exp {\A\"([^\"]*)\"\s*} }
                    '  { set exp {\A'([^']*)'\s*} }
                    default { set exp {\A([^\s>]*)\s*} }
                }
                if { ![regexp -indices -start $i $exp $html match attr_value_idx] } {
                    # No end quote.
                    set attr_value [string range $html $i+1 end]
                    set i [string length $html]
                } else {
                    set attr_value [string range $html [lindex $attr_value_idx 0] [lindex $attr_value_idx 1]]
                    set i [expr { [lindex $match 1] + 1}]
                }

                set attr_value [ns_unquotehtml $attr_value]

                lappend attributes [list $attr_name $attr_value]
                if { [info exists attribute_array] } {
                    set attribute_array_var($attr_name) $attr_value
                }
            }
        }
    }
    return $attributes

XQL Not present:

Generic, PostgreSQL, Oracle

[ hide source ] | [ make this the default ]