Forum OpenACS Q&A: new link_urls?
I am using 3.2.5's proc link_urls, which converts text that looks like a link into a real link (i.e. in chat)...
The regexp doesn't work on all kind of links. The following link gets cut after the first 0... (I love Vignette 😉:
http://www.spiegel.de/unispiegel/jobundberuf/0,1518,220964,00.html
This forum regexps the link correctly but not if it would have been:
www.spiegel.de/unispiegel/jobundberuf/0,1518,220964,00.html
Does anyone have an improved version of link_urls?
Thanks
Take a look at
packages/acs-tcl/tcl/text-html-procs.tcl
in the function ad_text_to_html for the regexps used
in 4.x to highlight links. It should be pretty easy to fix.
packages/acs-tcl/tcl/text-html-procs.tcl
in the function ad_text_to_html for the regexps used
in 4.x to highlight links. It should be pretty easy to fix.
Hello Jeff,
I have limited experience with regexps and just couldn't get ad_text_to_html display www.foo.com correctly without destroying http://www.foo.com items...
Any hints?
Here the proc
I have limited experience with regexps and just couldn't get ad_text_to_html display www.foo.com correctly without destroying http://www.foo.com items...
Any hints?
Here the proc
ad_proc -public ad_text_to_html { -no_links:boolean text } { Converts plaintext to html. Also translates any recognized email addresses or URLs into a hyperlink. @param no_links will prevent it from highlighting @author Branimir Dolicki (branimir@arsdigita.com) @author Lars Pind (lars@pinds.com) @creation-date 19 July 2000 } { if { !$no_links_p } { # We start by putting a space in front so our URL/email highlighting will work # for URLs/emails right in the beginning of the text. set text " $text" # if something is " http://" or " https://" # we assume it is a link to an outside source. # (bd) The only purpose of thiese sTaRtUrL and # eNdUrL markers is to get rid of trailing dots, # commas and things like that. Note that there # is a TAB before and after each marker. regsub -nocase -all {([^a-zA-Z0-9]+)(http://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text regsub -nocase -all {([^a-zA-Z0-9]+)(https://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text regsub -nocase -all {([^a-zA-Z0-9]+)(ftp://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text # email links have the form xxx@xxx.xxx # JCD: don't treat things =xxx@xxx.xxx as email since most # common occurance seems to be in urls (although VPATH bounce # emails like bounce-user=domain.com@sourcehost.com will then not # work correctly). It's all quite ugly. regsub -nocase -all {([^a-zA-Z0-9=]+)(mailto:)?([^=\(\)\s:;,@<>]+@[^\(\)\s.:;,@<>]+[.][^\(\)\s:;,@<>]+)} $text \ "\\1\tsTaRtEmAiL\\3eNdEmAiL\t" text } # At this point, before inserting some of our own <, >, and "'s # we quote the ones entered by the user: set text [ad_quotehtml $text] # Convert _single_ CRLF's to <br>'s to preserve line breaks regsub -all {\r*\n} $text "<br>\n" text # Convert every two spaces to an nbsp regsub -all { } $text "\\\ " text # turn CRLFCRLF into <P> if { [regsub -all {\r\n\s*\r\n} $text "<p>" text] == 0 } { # try LFLF if { [regsub -all {\n\s*\n} $text "<p>" text] == 0 } { # try CRCR regsub -all {\r\s*\r} $text "<p>" text } } if { !$no_links_p } { # Dress the links and emails with A HREF regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdUrL\t)} $text {\2\1} text regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdEmAiL\t)} $text {\2\1} text regsub -all {\tsTaRtUrL([^\t]*)eNdUrL\t} $text {<a href="\1">\1</a>} text regsub -all {\tsTaRtEmAiL([^\t]*)eNdEmAiL\t} $text {<a href="mailto:\1">\1</a>} text set text [string trimleft $text] } # Convert every tab to 4 nbsp's regsub -all {\t} $text {\ \ \ \ } text # JCD: Remove all the eNd sTaRt stuff and warn if we do it since its bad # to have these left (means something is broken in our regexps above) if {[regsub -all {(sTaRtUrL|eNdUrL|sTaRtEmAiL|eNdEmAiL)} $text {} text]} { ns_log warning "Replaced sTaRt/eNd magic tags in ad_text_to_html" } return $text }
yuuuhu,
I've got it:
I've got it:
ad_proc -public ad_text_to_html { -no_links:boolean text } { Converts plaintext to html. Also translates any recognized email addresses or URLs into a hyperlink. @param no_links will prevent it from highlighting @author Branimir Dolicki (branimir@arsdigita.com) @author Lars Pind (lars@pinds.com) @creation-date 19 July 2000 } { if { !$no_links_p } { # We start by putting a space in front so our URL/email highlighting will work # for URLs/emails right in the beginning of the text. set text " $text" # if something is " http://" or " https://" # we assume it is a link to an outside source. # (bd) The only purpose of thiese sTaRtUrL and # eNdUrL markers is to get rid of trailing dots, # commas and things like that. Note that there # is a TAB before and after each marker. regsub -nocase -all {([^a-zA-Z0-9]+)((http|https|ftp)://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text regsub -nocase -all {([^a-zA-Z0-9/]+)(www\.[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrLnOhTtP\\2eNdUrLnOhTtP\t" text # email links have the form xxx@xxx.xxx # JCD: don't treat things =xxx@xxx.xxx as email since most # common occurance seems to be in urls (although VPATH bounce # emails like bounce-user=domain.com@sourcehost.com will then not # work correctly). It's all quite ugly. regsub -nocase -all {([^a-zA-Z0-9=]+)(mailto:)?([^=\(\)\s:;,@<>]+@[^\(\)\s.:;,@<>]+[.][^\(\)\s:;,@<>]+)} $text \ "\\1\tsTaRtEmAiL\\3eNdEmAiL\t" text } # At this point, before inserting some of our own <, >, and "'s # we quote the ones entered by the user: set text [ad_quotehtml $text] # Convert _single_ CRLF's to <br>'s to preserve line breaks regsub -all {\r*\n} $text "<br>\n" text # Convert every two spaces to an nbsp regsub -all { } $text "\\\ " text # turn CRLFCRLF into <P> if { [regsub -all {\r\n\s*\r\n} $text "<p>" text] == 0 } { # try LFLF if { [regsub -all {\n\s*\n} $text "<p>" text] == 0 } { # try CRCR regsub -all {\r\s*\r} $text "<p>" text } } if { !$no_links_p } { # Dress the links and emails with A HREF regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdUrL\t)} $text {\2\1} text regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdUrLnOhTtP\t)} $text {\2\1} text regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdEmAiL\t)} $text {\2\1} text regsub -all {\tsTaRtUrL([^\t]*)eNdUrL\t} $text {<a href="\1">\1} text regsub -all {\tsTaRtUrLnOhTtP([^\t]*)eNdUrLnOhTtP\t} $text {<a href="http://\1">\1</a>} text regsub -all {\tsTaRtEmAiL([^\t]*)eNdEmAiL\t} $text {<a href="mailto:\1">\1</a>} text set text [string trimleft $text] } # Convert every tab to 4 nbsp's regsub -all {\t} $text {\ \ \ \ } text # JCD: Remove all the eNd sTaRt stuff and warn if we do it since its bad # to have these left (means something is broken in our regexps above) if {[regsub -all {(sTaRtUrL|eNdUrL|sTaRtUrLnOhTtP|eNdUrLnOhTtP|sTaRtEmAiL|eNdEmAiL)} $text {} text]} { ns_log warning "Replaced sTaRt/eNd magic tags in ad_text_to_html" } return $text }