85.71%
Search · Index

V.2 Basic String Operations

If your program receives data from a Web client, it comes in as a string. If your program sends an HTML page back to a Web client, it goes out as a string. This puts the string data type at the heart of Web page development:
set whole_page "some stuff for the top of the page\n\n"
append whole_page "some stuff for the middle of the page\n\n"
append whole_page "some stuff for the bottom of the page\n\n"
# done composing the page, let's write it back to the user
ns_return 200 text/html $whole_page
If you're processing data from the user, typically entered into an HTML form, you'll be using a rich variety of built-in string-handling procedures. Suppose that a user is registering at your site with the form variables first_names, last_name, email, password. Here's how we might build up a list of exceptions (using the Tcl lappend command, described in the chapter on lists):
# compare the first_names value to the empty string
if { [string compare $first_names ""] == 0 } {
    lappend exception_list "You forgot to type your first name"
}

# see if their email address has the form
#   something at-sign something
if { ![regexp {.+@.+} $email] } {
    lappend exception_list "Your email address doesn't look valid."
}

if { [string length $password] > 20 } {
    lappend exception_list "The password you selected is too long."
}
If there aren't any exceptions, we have to get these data ready for insertion into the database:
# remove whitespace from ends of input (if any)
set last_name_trimmed [string trim $last_name]

# escape any single quotes with an extra one (since the SQL
# string literal quoting system uses single quotes)
regsub -all ' $last_name_trimmed '' last_name_final

set sql_insert "insert into users (..., last_name, ...) 
values 
(..., '$last_name_final', ...)"

Looking for stuff in a string

The simplest way to look for a substring within a string is with the string first command. Some users of photo.net complained that they didn't like seeing classified ads that were simply pointers to the eBay auction site. Here's a simplified snippet from http://software.arsdigita.com/www/gc/place-ad-3.tcl:

if { [string first "ebay" [string tolower $full_ad]] != -1 } {
    # return an exception
    ...
}
an alternative formulation would be
if { [regexp -nocase {ebay} $full_ad] } {
    # return an exception
    ...
}
Both implementations will catch any capitalization variant of "eBAY". Both implementations will miss "e-bay" but it doesn't matter because if the poster of the ad includes a link with a URL, the hyperlink will contain "ebay". What about false positives? If you visit www.m-w.com and search for "*ebay*" you'll find that both implementations might bite someone selling rhododendrons or a water-powered mill. That's why the toolkit code checks a "DisalloweBay" parameter, set by the publisher, before declaring this an exception.

If you're just trying to find a substring, you can use either string first or regexp. If you're trying to do something more subtle, you'll need regexp (described more fully in the chapter "Pattern Matching"):

if { ![regexp {[a-z]} $full_ad] } {
    # no lowercase letters in the ad!
    append exception_text "
      
      
            
      
  • Your ad appears to be all uppercase. ON THE INTERNET THIS IS CONSIDERED SHOUTING. IT IS ALSO MUCH HARDER TO READ THAN MIXED CASE TEXT. So we don't allow it, out of decorum and consideration for people who may be visually impaired." incr exception_count }
  • Using only part of a string 

    In the ArsDigita Community System, we have a page that shows a user's complete history with a Web service, e.g., http://photo.net/shared/community-member.tcl?user_id=23069 shows all of the postings by Philip Greenspun. If a comment on a static page is short, we want to show the entire message. If not, we want to show just the first 1000 characters.

    In http://software.arsdigita.com/www/shared/community-member.tcl, we find the following use of the string range command:

    if { [string length $message] > 1000 } {
        set complete_message "[string range $message 0 1000]... "
    } else {
        set complete_message $message
    }
    

    Fortran-style formatting and reading of numbers 

    The Tcl commands format and scan resemble C's printf and scanf commands. That's pretty much all that any Tcl manual will tell you about these commands, which means that you're kind of S.O.L. if you don't know C. The basic idea of these commands comes from Fortran, a computer language developed by John Backus at IBM in 1954. The FORMAT command in Fortran would let you control the printed display of a number, including such aspects as spaces of padding to the left and digits of precision after the decimal point.

    With Tcl format, the first argument is a pattern for how you'd like the final output to look. Inside the pattern are placeholders for values. The second through Nth arguments to format are the values themselves:

    format pattern value1 value2 value3 .. valueN
    
    We can never figure out how to use format without either copying an earlier fragment of pattern or referring to the man page (http://www.tcl.tk/man/tcl8.4/TclCmd/format.htm). However, here are some examples for you to copy:
    % # format prices with two digits after the point
    % format "Price:  %0.2f" 17
    Price:  17.00
    % # pad some stuff out to fill 20 spaces
    % format "%20s" "a long thing"
            a long thing
    % format "%20s" "23"
                      23
    % # notice that the 20 spaces is a MINIMUM; use string range
    % # if you might need to truncate
    % format "%20s" "something way longer than 20 spaces"
    something way longer than 20 spaces
    % # turn a number into an ASCII character
    % format "%c" 65
    A
    
    The Tcl command scan performs the reverse operation, i.e., parses an input string according to a pattern and stuffs values as it finds them into variables:
    % # turn an ASCII character into a number
    % scan "A" "%c" the_ascii_value
    1
    % set the_ascii_value
    65
    % 
    
    Notice that the number returned by scan is a count of how many conversions it was able to perform successfully. If you really want to use scan, you'll need to visit the man page: http://www.tcl.tk/man/tcl8.4/TclCmd/scan.htm. For an idea of how useful this is for Web development, consider that the entire 250,000-line ArsDigita Community System does not contain a single use of the scan command.


    Reference: String operations

      A.) Commands that don't start with string

    • append variable_name value1 value2 value3 ... valueN
      sets the variable defined by variable_name to the concatenation of the old value and all the remaining arguments (http://www.tcl.tk/man/tcl8.4/TclCmd/append.htm)
    • regexp ?switches? expression string ?matchVar? ?subMatchVar subMatchVar ...?
      Returns 1 if expression matches string; 0 otherwise. If successful, regexp sets the match variables to the parts of string that matches the corresponding parts of expression.
      % set fraction "5/6"
      5/6
      % regexp {(.*)/(.*)} $fraction match num denom
      1
      % set match
      5/6
      % set num
      5
      % set denom
      6
      
      (more: the pattern matching chapter and http://www.tcl.tk/man/tcl8.4/TclCmd/regexp.htm)
    • regsub ?switches? expression string substitution_spec result_variable_name
      Returns a count of the number of matching items that were found and replaced. Primarily called for its effect in setting result_variable_name.

      Here's an example where we ask a user to type in keywords, separated by commands. We then expect to feed this list to a full-text search indexer that will throw an error if handed two commas in a row. We use regsub to clean up what the user typed:

      # here we destructively modify the variable $query_string'
      # replacing every occurrence of one or more commas with a single
      # command 
      % set query_string "samoyed,, sledding, harness"
      samoyed,, sledding, harness
      % regsub -all {,+} $query_string "," query_string
      2
      % set query_string
      samoyed, sledding, harness
      
      (more: the pattern matching chapter and http://www.tcl.tk/man/tcl8.4/TclCmd/regsub.htm)

      were dramatically improved with the Tcl 8.1 release. For a Web developer the most important feature is the inclusion of non-greedy regular expressions. This makes it easy to match the contents of HTML tags. See http://www.scriptics.com/services/support/howto/regexp81.html for a full discussion of the differences.

      B.) Commands that start with string

      (all of which are documented at http://www.tcl.tk/man/tcl8.4/TclCmd/string.htm)


    • string compare string1 string2
      returns 0 if the two strings are equal, -1 if string1 sorts lexicographically before string2, 1 if string2 sorts lexicographically before string1:
      string compare apple applesauce  ==> -1
      string compare apple Apple ==> 1
      
    • string first string1 string2
      returns -1 if string1 is not within string2, else returns the index of the first occurrence. Indices start from zero, e.g.,
      string first tcl catclaw  ==> 2
      
    • string last string1 string2
      -1 if string1 is not within string2, else index of last occurrence.
      string last abra abracadabra ==> 7
      
    • string match pattern string
      1 if string matches pattern, 0 if not. See the chapter on pattern matching for an explanation of patterns.
    • string range string i j
      range of characters in string from index i to j, inclusive.
      string range catclaw 2 4 ==> tcl
      
    • string tolower string
      string in lower case.
      string compare weBmaster Webmaster => 1
      
      string compare [string tolower weBmaster] \
                     [string tolower Webmaster] => 0
      
    • string toupper string
      string in upper case.
      set password "ferrari"
      string compare "FERRARI" [string toupper $password] ==> 0
      
    • string trim string ?chars?
      trims chars from right and left side of string; defaults to whitespace.
      set password [string trim $form_password] ; # see above example
      
    • string trimleft string ?chars?
      trims chars from left of string; defaults to whitespace.
      set password [string trimleft $form_password] 
      
    • string trimright string ?chars?
      trims chars from right of string; defaults to whitespace.
      set password [string trimright $form_password] 
      
    • string wordend string index
      index of the first character after the last character of the word containing index.
      string wordend "tcl is the greatest" 0 ==>3
      
    • string wordstart string index
      index of the first char of the word containing index.
      string wordstart "tcl is the greatest" 5 ==> 4
      



    Exercises ( see section V.3  List Operations )

    ---

    based on Tcl for Web Nerds