We have improved those procedures to include the most important characters we need to use in Spain.
We scan for new static page to the database and then we filter the content of the file with:
ad_proc util_condense_entities { html }
Then we write the modified and filtered file.
Maybe could serve as a template for other languages.
---------------
#packages/acs-tcl/tcl/text-html-procs.tcl
ad_proc util_expand_entities { html } {
Replaces all occurrences of common HTML entities with their plaintext equivalents
in a way that's appropriate for pretty-printing.
This proc is more suitable for pretty-printing that it's
sister-proc, <a href="/api-doc/proc-view?proc=util_expand_entities_ie_style"><code>util_expand_entities_ie_style</code></a>.
The two differences are that this one is more strict: it requires
proper entities i.e., both opening ampersand and closing semicolon,
and it doesn't do numeric entities, because they're generally not safe to send to browsers.
If we want to do numeric entities in general, we should also
consider how they interact with character encodings.
} {
regsub -all {<} $html {<} html
regsub -all {>} $html {>} html
regsub -all {"} $html {"} html
regsub -all {—} $html {--} html
regsub -all {—} $html {--} html
regsub -all {á} $html {á} html
regsub -all {é} $html {é} html
regsub -all {í} $html {í} html
regsub -all {ó} $html {ó} html
regsub -all {ú} $html {ú} html
regsub -all {Á} $html {Á} html
regsub -all {É} $html {É} html
regsub -all {Í} $html {Í} html
regsub -all {Ó} $html {Ó} html
regsub -all {Ú} $html {Ú} html
regsub -all {ñ} $html {ñ} html
regsub -all {Ñ} $html {Ñ} html
regsub -all {¿} $html {¿} html
regsub -all {¡} $html {¡} html
regsub -all {ç} $html {ç} html
regsub -all {Ç} $html {Ç} html
regsub -all {ü} $html {ü} html
regsub -all {Ü} $html {Ü} html
regsub -all {&} $html {\&} html
return $html
}
ad_proc util_condense_entities { html } {
Replaces plaintext extended characters with their HTML entities equivalents.
} {
regsub -all {&} $html {\&} html
regsub -all {<} $html {\<} html
regsub -all {>} $html {\>} html
regsub -all {"} $html {\"} html
regsub -all {\-\-} $html {\—} html
regsub -all {\-\-} $html {\—} html
regsub -all {á} $html {\á} html
regsub -all {é} $html {\é} html
regsub -all {í} $html {\í} html
regsub -all {ó} $html {\ó} html
regsub -all {ú} $html {\ú} html
regsub -all {Á} $html {\Á} html
regsub -all {É} $html {\É} html
regsub -all {Í} $html {\Í} html
regsub -all {Ó} $html {\Ó} html
regsub -all {Ú} $html {\Ú} html
regsub -all {ñ} $html {\ñ} html
regsub -all {Ñ} $html {\Ñ} html
regsub -all {¿} $html {\¿} html
regsub -all {¡} $html {\¡} html
regsub -all {ç} $html {\ç} html
regsub -all {Ç} $html {\Ç} html
regsub -all {ü} $html {\ü} html
regsub -all {Ü} $html {\Ü} html
return $html
}