ad_dom_fix_html (private)
ad_dom_fix_html -html html [ -marker marker ] [ -dom ]
Defined in packages/acs-tcl/tcl/text-html-procs.tcl
Similar in spirit to the famous Tidy command line utility, this proc takes a piece of possibly invalid markup and returns a 'fixed' version where unopened tags have been closed and attribute specifications have been normalized by transforming them in the form
attribute-name="attribute value"
. All attributes with an invalid (non-alphanumeric) name will be stripped.
Be aware that every comment and also the possibly present DOCTYPE declaration will be stripped from the markup. Also, most of tag's internal whitespace will be trimmed. This behavior comes from the htmlparse library used in this implementation.
- Switches:
- -html (required)
- Markup to process
- -marker (optional, defaults to
"root"
)- Root element use to enforce a single root of the DOM tree.
- -dom (optional, boolean)
- When this flag is set, instead of returning markup, the proc will return the tDOM object built during the operation. Useful when the result should be used by tDOM anyway, so we can avoid superfluous parsing.
- Returns:
- markup or a tDOM document object if the -dom flag is specified
- Author:
- Antonio Pisano
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
Source code: if {[catch {package require struct}]} { error "Package struct non found on the system" } if {[catch {package require htmlparse}]} { error "Package htmlparse non found on the system" } set tree [::struct::tree] catch {::htmlparse::tags destroy} ::struct::stack ::htmlparse::tags ::htmlparse::tags push root $tree set root type root ::htmlparse::parse -cmd [list ::htmlparse::2treeCallback $tree] -incvar errs $html $tree walk root -order post n { ::htmlparse::Reorder $tree $n } ::htmlparse::tags destroy set lmarker "<$marker>" set rmarker "</$marker>" if {[package vsatisfies [package require tdom] 0.9.3]} { # tDOM 0.9.3 expects HTML DOM trees to be wrapped by an # HTML element, if they are to be serialized properly. set doc [dom createDocument html] set root [[$doc documentElement] appendChild [$doc createElement $marker]] } else { set doc [dom createDocument $marker] set root [$doc documentElement] } set queue {} lappend queue [list $root [$tree children [$tree children root]]] try { while {$queue ne {}} { lassign [lindex $queue 0] domparent treechildren set queue [lrange $queue 1 end] foreach child $treechildren { set type [$tree get $child type] set data [$tree get $child data] if {$type eq "PCDATA"} { set el [$doc createTextNode $data] } else { set el [$doc createElement $type] # parse element attributes while {$data ne ""} { set data [string trim $data] # attribute with a value, optionally surrounded by double or single quotes if {[regexp "^(\[^= \]+)=(\"\[^\"\]*\"|'\[^'\].*'|\[^ \]*)" $data m attname attvalue]} { if {[string match "\"*\"" $attvalue] || [string match "'*'" $attvalue]} { set attvalue [string range $attvalue 1 end-1] } # attribute with no value } elseif {[regexp {^([^\s]+)} $data m attname]} { set attvalue "" } else { error "Unrecoverable attribute spec in supplied markup" } # skip bogus attribute names if {[string is alnum -strict $attname]} { $el setAttribute $attname $attvalue } set data [string range $data [string length $m] end] } } $domparent appendChild $el set elchildren [$tree children $child] if {$elchildren ne {}} { lappend queue [list $el $elchildren] } } } } on error {errorMsg} { $doc delete throw $::errorInfo $errorMsg } finally { $tree destroy } if {$dom_p} { return $doc } else { set html [$doc asHTML] $doc delete set html [string range $html [string length $lmarker] end-[string length $rmarker]] } return [string trim $html]XQL Not present: Generic, PostgreSQL, Oracle