util_expand_entities_ie_style (public)

 util_expand_entities_ie_style html

Defined in packages/acs-tcl/tcl/text-html-procs.tcl

Replaces all occurrences of o and &x0f; type HTML character entities to their ASCII equivalents. It also handles lt, gt, quot, ob, cb and amp.

This proc does the expansion in the style of IE and Netscape, which is to say that it doesn't require the trailing semicolon on the entity to replace it with something else. The reason we do that is that this proc was designed for checking HTML for security-issues, and since entities can be used for hiding malicious code, we'd better simulate the liberal interpretation that browsers does, even though it complicates matters.

Unlike its sister proc, util_expand_entities, it also expands numeric entities (#999 or #xff style).

Parameters:
html
Author:
Lars Pind <lars@pinds.com>
Created:
October 17, 2000

Partial Call Graph (max 5 caller/called nodes):
%3 ad_parse_html_attributes_upvar ad_parse_html_attributes_upvar (private) util_expand_entities_ie_style util_expand_entities_ie_style ad_parse_html_attributes_upvar->util_expand_entities_ie_style

Testcases:
No testcase defined.
Source code:
        array set entities { lt < gt > quot \" ob \{ cb \} amp & }

        # Expand HTML entities on the value
        for { set i [string first & $html] } { $i != -1 } { set i [string first & $html $i] } {

            set match_p 0
            switch -regexp -- [string index $html $i+1]] {
                # {
                switch -regexp -- [string index $html $i+2] {
                    [xX] {
                        regexp -indices -start [expr {$i+3}] {[0-9a-fA-F]*} $html hex_idx
                        set hex [string range $html [lindex $hex_idx 0] [lindex $hex_idx 1]]
                        set html [string replace $html $i [lindex $hex_idx 1]  [subst -nocommands -novariables "\\x$hex"]]
                        set match_p 1
                    }
                    [0-9] {
                        regexp -indices -start [expr {$i+2}] {[0-9]*} $html dec_idx
                        set dec [string range $html [lindex $dec_idx 0] [lindex $dec_idx 1]]
                        # $dec might contain leading 0s. Since format evaluates $dec as expr
                        # leading 0s cause octal interpretation and therefore errors on e.g. &#0038;
                        set dec [string trimleft $dec 0]
                        if {$dec eq ""} {set dec 0}
                        set html [string replace $html $i [lindex $dec_idx 1]  [format "%c" $dec]]
                        set match_p 1
                    }
                }
            }
        [a-zA-Z] {
            if { [regexp -indices -start $i {\A&([^\s;]+)} $html match entity_idx] } {
                set entity [string tolower [string range $html [lindex $entity_idx 0] [lindex $entity_idx 1]]]
                if { [info exists entities($entity)] } {
                    set html [string replace $html $i [lindex $match 1] $entities($entity)]
                }
                set match_p 1
            }
        }
    }
    incr i
    if { $match_p } {
        # remove trailing semicolon
        if {[string index $html $i] eq ";"} {
            set html [string replace $html $i $i]
        }
    }
}
return $html
XQL Not present:
Generic, PostgreSQL, Oracle
[ hide source ] | [ make this the default ]
Show another procedure: