ad_dom_fix_html (private)

 ad_dom_fix_html -html html [ -marker marker ] [ -dom ]

Defined in packages/acs-tcl/tcl/text-html-procs.tcl

Similar in spirit to the famous Tidy command line utility, this proc takes a piece of possibly invalid markup and returns a 'fixed' version where unopened tags have been closed and attribute specifications have been normalized by transforming them in the form attribute-name="attribute value". All attributes with an invalid (non-alphanumeric) name will be stripped.

Be aware that every comment and also the possibly present DOCTYPE declaration will be stripped from the markup. Also, most of tag's internal whitespace will be trimmed. This behavior comes from the htmlparse library used in this implementation.

Switches:
-html
(required)
Markup to process
-marker
(defaults to "root") (optional)
Root element use to enforce a single root of the DOM tree.
-dom
(boolean) (optional)
When this flag is set, instead of returning markup, the proc will return the tDOM object built during the operation. Useful when the result should be used by tDOM anyway, so we can avoid superfluous parsing.
Returns:
markup or a tDOM document object if the -dom flag is specified
Author:
Antonio Pisano

Partial Call Graph (max 5 caller/called nodes):
%3 ad_dom_sanitize_html ad_dom_sanitize_html (public) ad_dom_fix_html ad_dom_fix_html ad_dom_sanitize_html->ad_dom_fix_html dom dom ad_dom_fix_html->dom

Testcases:
No testcase defined.
Source code:
        if {[catch {package require struct}]} {
            error "Package struct non found on the system"
        }
        if {[catch {package require htmlparse}]} {
            error "Package htmlparse non found on the system"
        }

        set tree [::struct::tree]


        catch {::htmlparse::tags destroy}

        ::struct::stack ::htmlparse::tags
        ::htmlparse::tags push root
        $tree set root type root

        ::htmlparse::parse  -cmd [list ::htmlparse::2treeCallback $tree]  -incvar errs $html

        $tree walk root -order post n {
            ::htmlparse::Reorder $tree $n
        }

        ::htmlparse::tags destroy


        set lmarker "<$marker>"
        set rmarker "</$marker>"
        if {[package vsatisfies [package require tdom] 0.9.3]} {
            # tDOM 0.9.3 expects HTML DOM trees to be wrapped by an
            # HTML element, if they are to be serialized properly.
            set doc [dom createDocument html]
            set root [[$doc documentElement] appendChild  [$doc createElement $marker]]
        } else {
            set doc [dom createDocument $marker]
            set root [$doc documentElement]
        }

        set queue {}
        lappend queue [list $root [$tree children [$tree children root]]]
        while {$queue ne {}} {
            lassign [lindex $queue 0] domparent treechildren
            set queue [lrange $queue 1 end]

            foreach child $treechildren {
                set type [$tree get $child type]
                set data [$tree get $child data]
                if {$type eq "PCDATA"} {
                    set el [$doc createTextNode $data]
                } else {
                    set el [$doc createElement $type]

                    # parse element attributes
                    while {$data ne ""} {
                        set data [string trim $data]
                        # attribute with a value, optionally surrounded by double or single quotes
                        if {[regexp "^(\[^= \]+)=(\"\[^\"\]*\"|'\[^'\].*'|\[^ \]*)" $data m attname attvalue]} {
                            if {[string match "\"*\"" $attvalue] ||
                                [string match "'*'" $attvalue]} {
                                set attvalue [string range $attvalue 1 end-1]
                            }
                            # attribute with no value
                        } elseif {[regexp {^([^\s]+)} $data m attname]} {
                            set attvalue ""
                        } else {
                            error "Unrecoverable attribute spec in supplied markup"
                        }

                        # skip bogus attribute names
                        if {[string is alnum -strict $attname]} {
                            $el setAttribute $attname $attvalue
                        }

                        set data [string range $data [string length $m] end]
                    }
                }

                $domparent appendChild $el

                set elchildren [$tree children $child]
                if {$elchildren ne {}} {
                    lappend queue [list $el $elchildren]
                }
            }
        }

        $tree destroy

        if {$dom_p} {
            return $doc
        } else {
            set html [$doc asHTML]
            $doc delete
            set html [string range $html [string length $lmarker] end-[string length $rmarker]]
        }

        return [string trim $html]
XQL Not present:
Generic, PostgreSQL, Oracle
[ hide source ] | [ make this the default ]
Show another procedure: