Forum OpenACS Q&A: Non Greedy Regexp for Tag Substitution

To make it easy for our plain text users, we want to allow them to place standard links in there text using square brackets and a fixed word like item1 as in:

[item1]

AND have it end up displayed as

<a href="http://openacs.org">Great OpenSource Toolkit</a>.

So if there are multiple items and occasionally the same [item1] a few times what is a good way to do a substitution. I will get the substitutions from a table but I could use help with the actual regexp to do the substitution....

I was reading about pattern matching and greedy expresions in http://www.arsdigita.com/books/tcl/pattern-matching.adp but my substitutions need to be non greedy and the reference to scriptics where this is discussed is a broken link.

Please suggest some code to accomplish this.
Thank you!

-Bob

Collapse
Posted by Jerry Asher on
Common thoughts and Synchronicity.

The key to non-greedy expressions is the question-mark.

I just did something similar in my blogging module. Trying to rip off the best, I let blog users access the userland glossary by entering the name of a glossary entry, surrounded by double colons (something like ::daveWinerImage::, ::usFlag::, or ::bill gates::).

I did this with the following, that you might adapt for your own use. The proc, blog_glossary_merge, is what does the non-greedy merge. I added that (and some other mods) to a few choice locations within confirm.tcl.

###
# userland glossary
###
proc_doc blog_upload_glossary {} {
    uploads userland glossary into blog shortcut table
} {
    set shortcuts [ns_httpget http://static.userland.com/shortcuts/glossary.xml]
    
    set counter 0
    set start 0
    while {[regexp -indices -start $start {.*?<name>(.*?)</name>.*?<value>(.*?)</value>} $shortcuts match name value]} {
        set sname  [string range $shortcuts [lindex $name 0] [lindex $name 1]]
        set svalue [string range $shortcuts [lindex $value 0] [lindex $value 1]]
        # ns_log notice name $sname
        # ns_log notice value $svalue
        set start [lindex $value 1]
        incr counter
        nsv_set blog_glossary $sname [util_expand_entities $svalue]
        nsv_lappend blog_glossary words $sname
    }
    nsv_set blog_glossary init 1
    ns_log notice blog_upload_glossary: found $counter entries
}

if {![nsv_array exists blog_glossary]} {
    nsv_set blog_glossary words [list]
    nsv_set blog_glossary "::::" "::"
    nsv_set blog_glossary "::::::::" "::::"
    ns_schedule_proc -once -thread  5 blog_upload_glossary
    ns_schedule_daily -thread  2 0 blog_upload_glossary
}

proc blog_glossary_merge {message} {
    if {[nsv_exists blog_glossary init]} {
        set counter 0
        while {[regexp -indices {::(.*?)::} $message match iphrase]} {
            set phrase [string range $message [lindex $iphrase 0] [lindex $iphrase 1]]
            # ns_log notice found phrase $phrase
            if {[empty_string_p $phrase]} {
                set value "::"
            } else {
                if {[nsv_exists blog_glossary $phrase]} {
                    set value " [nsv_get blog_glossary $phrase] "
                } else {
                    set value "::${phrase}::"
                }
            }
            # ns_log notice value is $value
            set newmessage [string replace $message [expr [lindex $iphrase 0] - 2] [expr [lindex $iphrase 1] + 2] $value]
            # ns_log notice message is now $message
            # ns_log notice newmessage is now $newmessage
            set message $newmessage
            incr counter
        }
    } else {
        set fault "
        <center><table border=1 bgcolor=mistyrose cellpadding=10><tr><td>
        The global shortcuts glossary is down, sorry</td></tr></table></center>
        "
        error $fault
    }
    return $message
}
Collapse
Posted by Jerry Asher on
Uh, the final proc had some html encoded values that the roundtrip through bboard just decoded. Therefore it is thus this:
proc blog_glossary_merge {message} {
    if {[nsv_exists blog_glossary init]} {
        set counter 0
        while {[regexp -indices {::(.*?)::} $message match iphrase]} {
            set phrase [string range $message [lindex $iphrase 0] [lindex $iphrase 1]]
            # ns_log notice found phrase $phrase
            if {[empty_string_p $phrase]} {
                set value "&#58;&#58;"
            } else {
                if {[nsv_exists blog_glossary $phrase]} {
                    set value "&nbsp;[nsv_get blog_glossary $phrase]&nbsp;"
                } else {
                    set value "&#58;&#58;${phrase}&#58;&#58;"
                }
            }
            # ns_log notice value is $value
            set newmessage [string replace $message [expr [lindex $iphrase 0] - 2] [expr [lindex $iphrase 1] + 2] $value]
            # ns_log notice message is now $message
            # ns_log notice newmessage is now $newmessage
            set message $newmessage
            incr counter
        }
    } else {
        set fault "
        <center><table border=1 bgcolor=mistyrose cellpadding=10><tr><td>
        The global shortcuts glossary is down, sorry</td></tr></table></center>
        "
        error $fault
    }
    return $message
}
Collapse
Posted by Gilbert Wong on
to make it non-greedy, just put a ? after the pattern which would usually be greedy (*?, *+, etc). If you are searching for a specific pattern, you probably don't need a non-greedy quantifier. http://www.scriptics.com/man/tcl8.2.3/TclCmd/re_syntax.htm#M12