Forum OpenACS Q&A: new link_urls?

Collapse
Posted by David Kuczek on
I am using 3.2.5's proc link_urls, which converts text that looks like a link into a real link (i.e. in chat)...

The regexp doesn't work on all kind of links. The following link gets cut after the first 0... (I love Vignette 😉:

http://www.spiegel.de/unispiegel/jobundberuf/0,1518,220964,00.html

This forum regexps the link correctly but not if it would have been:

www.spiegel.de/unispiegel/jobundberuf/0,1518,220964,00.html

Does anyone have an improved version of link_urls?

Thanks

Collapse
2: Re: new link_urls? (response to 1)
Posted by Jeff Davis on
Take a look at
packages/acs-tcl/tcl/text-html-procs.tcl
in the function ad_text_to_html for the regexps used
in 4.x to highlight links.  It should be pretty easy to fix.
Collapse
3: Re: new link_urls? (response to 1)
Posted by David Kuczek on
Hello Jeff,

I have limited experience with regexps and just couldn't get ad_text_to_html display www.foo.com correctly without destroying http://www.foo.com items...

Any hints?

Here the proc

ad_proc -public ad_text_to_html {
    -no_links:boolean
    text 
} {
    Converts plaintext to html. Also translates any recognized 
    email addresses or URLs into a hyperlink.

    @param no_links will prevent it from highlighting 

    @author Branimir Dolicki (branimir@arsdigita.com)
    @author Lars Pind (lars@pinds.com)
    @creation-date 19 July 2000
} {

    if { !$no_links_p } {
	# We start by putting a space in front so our URL/email highlighting will work
	# for URLs/emails right in the beginning of the text.
	set text " $text"
	
	# if something is " http://" or " https://"
	# we assume it is a link to an outside source. 
	
	# (bd) The only purpose of thiese sTaRtUrL and
	# eNdUrL markers is to get rid of trailing dots,
	# commas and things like that.  Note that there
	# is a TAB before and after each marker.
	
	regsub -nocase -all {([^a-zA-Z0-9]+)(http://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
	regsub -nocase -all {([^a-zA-Z0-9]+)(https://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
	regsub -nocase -all {([^a-zA-Z0-9]+)(ftp://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
	
	# email links have the form xxx@xxx.xxx
        # JCD: don't treat things =xxx@xxx.xxx as email since most
        # common occurance seems to be in urls (although VPATH bounce
        # emails like bounce-user=domain.com@sourcehost.com will then not
        # work correctly).  It's all quite ugly.
 
        regsub -nocase -all {([^a-zA-Z0-9=]+)(mailto:)?([^=\(\)\s:;,@<>]+@[^\(\)\s.:;,@<>]+[.][^\(\)\s:;,@<>]+)} $text \
                "\\1\tsTaRtEmAiL\\3eNdEmAiL\t" text


    }    

    # At this point, before inserting some of our own <, >, and "'s
    # we quote the ones entered by the user:
    set text [ad_quotehtml $text]

    # Convert _single_ CRLF's to <br>'s to preserve line breaks
    regsub -all {\r*\n} $text "<br>\n" text

    # Convert every two spaces to an nbsp
    regsub -all {  } $text "\\\  " text

    # turn CRLFCRLF into <P>
    if { [regsub -all {\r\n\s*\r\n} $text "<p>" text] == 0 } {
	# try LFLF
	if { [regsub -all {\n\s*\n} $text "<p>" text] == 0 } {
		# try CRCR
	    regsub -all {\r\s*\r} $text "<p>" text
	}
    }
    
    if { !$no_links_p } {
	# Dress the links and emails with A HREF
	regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdUrL\t)} $text {\2\1} text
	regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdEmAiL\t)} $text {\2\1} text
	regsub -all {\tsTaRtUrL([^\t]*)eNdUrL\t} $text {<a href="\1">\1</a>} text
	regsub -all {\tsTaRtEmAiL([^\t]*)eNdEmAiL\t} $text {<a href="mailto:\1">\1</a>} text
	set text [string trimleft $text]
    }

    # Convert every tab to 4 nbsp's
    regsub -all {\t} $text {\ \ \ \ } text
    
    # JCD: Remove all the eNd sTaRt stuff and warn if we do it since its bad
    # to have these left (means something is broken in our regexps above)
    if {[regsub -all {(sTaRtUrL|eNdUrL|sTaRtEmAiL|eNdEmAiL)} $text {} text]} {
        ns_log warning "Replaced sTaRt/eNd magic tags in ad_text_to_html"
    }
    return $text
}
Collapse
4: Re: new link_urls? (response to 1)
Posted by David Kuczek on
yuuuhu,

I've got it:


ad_proc -public ad_text_to_html {
    -no_links:boolean
    text
} {
    Converts plaintext to html. Also translates any recognized
    email addresses or URLs into a hyperlink.

    @param no_links will prevent it from highlighting

    @author Branimir Dolicki (branimir@arsdigita.com)
    @author Lars Pind (lars@pinds.com)
    @creation-date 19 July 2000
} {

    if { !$no_links_p } {
        # We start by putting a space in front so our URL/email highlighting will work
        # for URLs/emails right in the beginning of the text.
        set text " $text"

        # if something is " http://" or " https://"
        # we assume it is a link to an outside source.

        # (bd) The only purpose of thiese sTaRtUrL and
        # eNdUrL markers is to get rid of trailing dots,
        # commas and things like that.  Note that there
        # is a TAB before and after each marker.

        regsub -nocase -all {([^a-zA-Z0-9]+)((http|https|ftp)://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
        regsub -nocase -all {([^a-zA-Z0-9/]+)(www\.[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrLnOhTtP\\2eNdUrLnOhTtP\t" text

        # email links have the form xxx@xxx.xxx
        # JCD: don't treat things =xxx@xxx.xxx as email since most
        # common occurance seems to be in urls (although VPATH bounce
        # emails like bounce-user=domain.com@sourcehost.com will then not
        # work correctly).  It's all quite ugly.

        regsub -nocase -all {([^a-zA-Z0-9=]+)(mailto:)?([^=\(\)\s:;,@<>]+@[^\(\)\s.:;,@<>]+[.][^\(\)\s:;,@<>]+)} $text \
                "\\1\tsTaRtEmAiL\\3eNdEmAiL\t" text


    }

    # At this point, before inserting some of our own <, >, and "'s
    # we quote the ones entered by the user:
    set text [ad_quotehtml $text]

    # Convert _single_ CRLF's to <br>'s to preserve line breaks
    regsub -all {\r*\n} $text "<br>\n" text

    # Convert every two spaces to an nbsp
    regsub -all {  } $text "\\\  " text

    # turn CRLFCRLF into <P>
    if { [regsub -all {\r\n\s*\r\n} $text "<p>" text] == 0 } {
        # try LFLF
        if { [regsub -all {\n\s*\n} $text "<p>" text] == 0 } {
                # try CRCR
            regsub -all {\r\s*\r} $text "<p>" text
        }
    }

    if { !$no_links_p } {
        # Dress the links and emails with A HREF
        regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdUrL\t)} $text {\2\1} text
        regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdUrLnOhTtP\t)} $text {\2\1} text
        regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdEmAiL\t)} $text {\2\1} text
        regsub -all {\tsTaRtUrL([^\t]*)eNdUrL\t} $text {<a href="\1">\1} text
        regsub -all {\tsTaRtUrLnOhTtP([^\t]*)eNdUrLnOhTtP\t} $text {<a href="http://\1">\1</a>} text
        regsub -all {\tsTaRtEmAiL([^\t]*)eNdEmAiL\t} $text {<a href="mailto:\1">\1</a>} text
        set text [string trimleft $text]
    }

    # Convert every tab to 4 nbsp's
    regsub -all {\t} $text {\ \ \ \ } text

    # JCD: Remove all the eNd sTaRt stuff and warn if we do it since its bad
    # to have these left (means something is broken in our regexps above)
    if {[regsub -all {(sTaRtUrL|eNdUrL|sTaRtUrLnOhTtP|eNdUrLnOhTtP|sTaRtEmAiL|eNdEmAiL)} $text {} text]} {
        ns_log warning "Replaced sTaRt/eNd magic tags in ad_text_to_html"
    }
    return $text
}