Forum OpenACS Q&A: Solution:

Collapse
4: Solution: (response to 1)
Posted by Jorge Garcia on
We have improved those procedures to include the most important characters we need to use in Spain.

We scan for new static page to the database and then we filter the content of the file with:

ad_proc util_condense_entities { html }

Then we write the modified and filtered file.

Maybe could serve as a template for other languages.

---------------
#packages/acs-tcl/tcl/text-html-procs.tcl

ad_proc util_expand_entities { html } {

    Replaces all occurrences of common HTML entities with their plaintext equivalents
    in a way that's appropriate for pretty-printing.


    This proc is more suitable for pretty-printing that it's
    sister-proc, <a href="/api-doc/proc-view?proc=util_expand_entities_ie_style"><code>util_expand_entities_ie_style</code></a>.
    The two differences are that this one is more strict: it requires
    proper entities i.e., both opening ampersand and closing semicolon,
    and it doesn't do numeric entities, because they're generally not safe to send to browsers.
    If we want to do numeric entities in general, we should also
    consider how they interact with character encodings.

} {
    regsub -all {&lt;} $html {<} html
    regsub -all {&gt;} $html {>} html
    regsub -all {&quot;} $html {"} html
    regsub -all {&mdash;} $html {--} html
    regsub -all {&#151;} $html {--} html
    regsub -all {&aacute;} $html {á} html
    regsub -all {&eacute;} $html {é} html
    regsub -all {&iacute;} $html {í} html
    regsub -all {&oacute;} $html {ó} html
    regsub -all {&uacute;} $html {ú} html
    regsub -all {&Aacute;} $html {Á} html
    regsub -all {&Eacute;} $html {É} html
    regsub -all {&Iacute;} $html {Í} html
    regsub -all {&Oacute;} $html {Ó} html
    regsub -all {&Uacute;} $html {Ú} html
    regsub -all {&ntilde;} $html {ñ} html
    regsub -all {&Ntilde;} $html {Ñ} html
    regsub -all {&iquest;} $html {¿} html
    regsub -all {&iexcl;} $html {¡} html
    regsub -all {&ccedil;} $html {ç} html
    regsub -all {&Ccedil;} $html {Ç} html
    regsub -all {&uuml;} $html {ü} html
    regsub -all {&Uuml;} $html {Ü} html
    regsub -all {&amp;} $html {\&} html
    return $html
}

ad_proc util_condense_entities { html } {

    Replaces plaintext extended characters with their HTML entities equivalents.

} {
    regsub -all {&} $html {\&amp;} html
    regsub -all {<} $html {\&lt;} html
    regsub -all {>} $html {\&gt;} html
    regsub -all {"} $html {\&quot;} html
    regsub -all {\-\-} $html {\&mdash;} html
    regsub -all {\-\-} $html {\&#151;} html
    regsub -all {á} $html {\&aacute;} html
    regsub -all {é} $html {\&eacute;} html
    regsub -all {í} $html {\&iacute;} html
    regsub -all {ó} $html {\&oacute;} html
    regsub -all {ú} $html {\&uacute;} html
    regsub -all {Á} $html {\&Aacute;} html
    regsub -all {É} $html {\&Eacute;} html
    regsub -all {Í} $html {\&Iacute;} html
    regsub -all {Ó} $html {\&Oacute;} html
    regsub -all {Ú} $html {\&Uacute;} html
    regsub -all {ñ} $html {\&ntilde;} html
    regsub -all {Ñ} $html {\&Ntilde;} html
    regsub -all {¿} $html {\&iquest;} html
    regsub -all {¡} $html {\&iexcl;} html
    regsub -all {ç} $html {\&ccedil;} html
    regsub -all {Ç} $html {\&Ccedil;} html
    regsub -all {ü} $html {\&uuml;} html
    regsub -all {Ü} $html {\&Uuml;} html
    return $html
}

Collapse
5: Re: Solution: (response to 4)
Posted by Jeff Davis on
It would probably be faster to do this with string map and for robustness I think you probably would want to put in the numeric codes for the characters rather than iso-8859-1 encoded characters (since that means if the tcl file encoding is not iso-8859-1 this will not behave as expected).

Heres an example of what I am talking about

set text [string map { \x00e4 ae \x00f6 oe \x00fc ue \x00df ss} $text]

we could definitely use this in openacs.

Collapse
6: Re: Solution: (response to 5)
Posted by Tilmann Singer on
This is also partially done in the util_text_to_url proc: http://dev.openacs.org:8000/cvs/openacs-4/packages/acs-tcl/tcl/utilities-procs.tcl?rev=1.28&content-type=text/x-cvsweb-markup

Maybe we should take this string map call out and put it in it's own proc, e.g. util_to_safe_ascii? Better suggestions for the name?