Forum OpenACS Q&A: Re: new link_urls?

Collapse
3: Re: new link_urls? (response to 1)
Posted by David Kuczek on
Hello Jeff,

I have limited experience with regexps and just couldn't get ad_text_to_html display www.foo.com correctly without destroying http://www.foo.com items...

Any hints?

Here the proc

ad_proc -public ad_text_to_html {
    -no_links:boolean
    text 
} {
    Converts plaintext to html. Also translates any recognized 
    email addresses or URLs into a hyperlink.

    @param no_links will prevent it from highlighting 

    @author Branimir Dolicki (branimir@arsdigita.com)
    @author Lars Pind (lars@pinds.com)
    @creation-date 19 July 2000
} {

    if { !$no_links_p } {
	# We start by putting a space in front so our URL/email highlighting will work
	# for URLs/emails right in the beginning of the text.
	set text " $text"
	
	# if something is " http://" or " https://"
	# we assume it is a link to an outside source. 
	
	# (bd) The only purpose of thiese sTaRtUrL and
	# eNdUrL markers is to get rid of trailing dots,
	# commas and things like that.  Note that there
	# is a TAB before and after each marker.
	
	regsub -nocase -all {([^a-zA-Z0-9]+)(http://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
	regsub -nocase -all {([^a-zA-Z0-9]+)(https://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
	regsub -nocase -all {([^a-zA-Z0-9]+)(ftp://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
	
	# email links have the form xxx@xxx.xxx
        # JCD: don't treat things =xxx@xxx.xxx as email since most
        # common occurance seems to be in urls (although VPATH bounce
        # emails like bounce-user=domain.com@sourcehost.com will then not
        # work correctly).  It's all quite ugly.
 
        regsub -nocase -all {([^a-zA-Z0-9=]+)(mailto:)?([^=\(\)\s:;,@<>]+@[^\(\)\s.:;,@<>]+[.][^\(\)\s:;,@<>]+)} $text \
                "\\1\tsTaRtEmAiL\\3eNdEmAiL\t" text


    }    

    # At this point, before inserting some of our own <, >, and "'s
    # we quote the ones entered by the user:
    set text [ad_quotehtml $text]

    # Convert _single_ CRLF's to <br>'s to preserve line breaks
    regsub -all {\r*\n} $text "<br>\n" text

    # Convert every two spaces to an nbsp
    regsub -all {  } $text "\\\  " text

    # turn CRLFCRLF into <P>
    if { [regsub -all {\r\n\s*\r\n} $text "<p>" text] == 0 } {
	# try LFLF
	if { [regsub -all {\n\s*\n} $text "<p>" text] == 0 } {
		# try CRCR
	    regsub -all {\r\s*\r} $text "<p>" text
	}
    }
    
    if { !$no_links_p } {
	# Dress the links and emails with A HREF
	regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdUrL\t)} $text {\2\1} text
	regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdEmAiL\t)} $text {\2\1} text
	regsub -all {\tsTaRtUrL([^\t]*)eNdUrL\t} $text {<a href="\1">\1</a>} text
	regsub -all {\tsTaRtEmAiL([^\t]*)eNdEmAiL\t} $text {<a href="mailto:\1">\1</a>} text
	set text [string trimleft $text]
    }

    # Convert every tab to 4 nbsp's
    regsub -all {\t} $text {\ \ \ \ } text
    
    # JCD: Remove all the eNd sTaRt stuff and warn if we do it since its bad
    # to have these left (means something is broken in our regexps above)
    if {[regsub -all {(sTaRtUrL|eNdUrL|sTaRtEmAiL|eNdEmAiL)} $text {} text]} {
        ns_log warning "Replaced sTaRt/eNd magic tags in ad_text_to_html"
    }
    return $text
}