Forum OpenACS Q&A: Another solution to Problems with Microsoft char set (eg 'smart quotes') in form input

Rich, thanks for pointing to your code. I like your use of the string map Tcl proc rather than regsub (more elegant and besides Brent Welch uses it in a similar example for getting rid of "smart quotes" in his Tcl book).

We've found it useful to include a few more mappings for Microsoft's "smart fractions" etc. Here's our version of this "decrufing" proc:

proc_doc decruft { cruft } { 
Takes a string removes all the cruft introduced by Microsoft apps,
such as their 'smart quotes'. Brute-force approach suggested by John
Walker's Demoronizer, a Perl script which does a few other things that
aren't germane here.

This proc could get called lots of places, but to make it
automatically run against all user input, we call it from
ad_page_variables and (for backward compatibility since this 
still lurks in the code) set_the_usual_form_variables. It 
should be trivial to add it to page_contract  or whatever OACS 4.5+ uses.
} {    
#    ns_log Notice "Before De-Cruft: $cruft"

    set cruft [ string map [ list 
 
 x82 , x83 f x84 ,, x85 ... x86 t x87 I x88 ^ x89 { */**} x8a S x8b < x8c Oe x8d {} x8e Z x8f {} x90 {} x91 ` x92 ' x93 {"} x94 {"} x95 * x96 - x97 -- x98 ~ x99 tm x9a S x9b > x9c oe x9d {} x9e Z x9f Y xbd 1/2 xbc 1/4 xbe 3/4  ] $cruft ]

#    ns_log Notice "After De-Cruft: $cruft"

    return $cruft
}

In addition, instead of calling this proc within modules like bboard and news, we find it useful to push the call back into ad_page_variables (and set_the_usual_form_variables since that still gets called some places). That way it always gets called regardless of the destination of the form data. FWIW, here's how we do it:

proc_doc ad_page_variables {variable_specs} {

Current syntax:

ad_page_variables {var_spec1 [varspec2] ... }

    This proc handles translating form inputs into Tcl variables, and checking to see that the correct set of inputs was supplied.  Note that this is mostly a check on the proper programming of a set of pages.

Here are the recognized var_specs:

variable; means it's required
{variable default-value}
      Optional, with default value.  If the value is supplied but is null, and the
      default-value is present, that value is used.
{variable -multiple-list}
      The value of the Tcl variable will be a list containing all of the values (in order) supplied for that form variable.  Particularly useful for collecting checkboxes or select multiples.
      Note that if required or optional variables are specified more than once, the first (leftmost) value is used, and the rest are ignored.

{variable -array}
      This syntax supports the idiom of supplying multiple form variables of the
      same name but ending with a "_[0-9]", e.g., foo_1, foo_2.... Each value will be
      stored in the array variable variable with the index being whatever follows the
      underscore.

There is an optional third element in the var_spec.  If it is "QQ", "qq", or some variant, a variable named "QQvariable" will be created and given the same value, but with single quotes escaped suitable for handing to SQL.

Other elements of the var_spec are ignored, so a documentation string
describing the variable can be supplied.

Note that the default value form will become the value form in a "set"

Note that the default values are filled in from left to right, and can depend on values of variables to their left:
ad_page_variables {
    file
    {start 0}
    {end {[expr $start + 20]}}
}

} {
#   ns_log Notice "ad_page_variables"
    set exception_list [list]
    set form [ns_getform]
    if { $form != "" } {
        set form_size [ns_set size $form]
        set form_counter_i 0
        
        # first pass -- go through all the variables supplied in the form
        while {$form_counter_i<$form_size} {
            set variable [ns_set key $form $form_counter_i]
            set value [ns_set value $form $form_counter_i]
            check_for_form_variable_naughtiness $variable $value
            set found "not"
            # find the matching variable spec, if any
            foreach variable_spec $variable_specs {
                if { [llength $variable_spec] >= 2 } {
                    switch -- [lindex $variable_spec 1] {
                        -multiple-list {
                            if { [lindex $variable_spec 0] == $variable } {
                                # variable gets a list of all the values
                                upvar 1 $variable var
                                lappend var $value
                                set found "done"
                                break
                            }
                        }
                        -array {
                            set varname [lindex $variable_spec 0]
                            set pattern "($varname)_(.+)"
                            if { [regexp $pattern $variable match array index] } {
                                if { ![empty_string_p $array] } {
                                    upvar 1 $array arr
                                    set arr($index) [ns_set value $form $form_counter_i]
                                }
                                set found "done"
                                break
                            }
                        }
                        default {
                            if { [lindex $variable_spec 0] == $variable } {
                                set found "set"
                                break
                            }
                        }
                    }
                } elseif { $variable_spec == $variable } {
                    set found "set"
                    break
                }
            }
            if { $found == "set" } {
                upvar 1 $variable var
                if { ![info exists var] } {
                    # take the leftmost value, if there are multiple ones
                    set var [ns_set value $form $form_counter_i]
                }
            }
            incr form_counter_i
        }
    }
    
    # now make a pass over each variable spec, making sure everything required is there
    # and doing defaulting for unsupplied things that aren't required
    foreach variable_spec $variable_specs {
        set variable [lindex $variable_spec 0]
        upvar 1 $variable var
        
        if { [llength $variable_spec] >= 2 } {
            if { ![info exists var] } {
                set default_value_or_flag [lindex $variable_spec 1]
                
                switch -- $default_value_or_flag {
                    -array {
                        # don't set anything
                    }
                    -multiple-list {
                        set var [list]
                    }
                    default {
                        # Needs to be set.
                        uplevel [list eval set $variable "[subst [list $default_value_or_flag]]"]
                        # This used to be:
                        #
                        #   uplevel [list eval [list set $variable "$default_value_or_flag"]]
                        #
                        # But it wasn't properly performing substitutions.
                    }
                }
            }
            
            
        } else {
            if { ![info exists var] } {
                lappend exception_list ""$variable" required but not supplied. Bummer."
            }
        }
        # modified by rhs@mit.edu on 1/31/2000
        # to QQ everything by default (but not arrays)
        if {[info exists var] && ![array exists var]} {
            # Begin De-Cruft stuff here
#           ns_log Notice "Before De-Cruft: $var"
            set var [decruft $var]
#           ns_log Notice "After De-Cruft: $var"
            # End De-Cruft stuff here
            upvar QQ$variable QQvar
            set QQvar [DoubleApos $var]
        }
        
    }
    
    set n_exceptions [llength $exception_list]
    # this is an error in the HTML form
    if { $n_exceptions == 1 } {
        ns_returnerror 500 [lindex $exception_list 0]
        return -code return
    } elseif { $n_exceptions > 1 } {
        ns_returnerror 500 "<li>[join $exception_list "
<li>"]
"
        return -code return
    }
}

For amusement value, here's a demo we created that shows the problem and the fix: http://www.epimetrics.com/demos/decrufter?demo_id=7