Forum OpenACS Q&A: new link_urls?
I am using 3.2.5's proc link_urls, which converts text that looks like a link into a real link (i.e. in chat)...
The regexp doesn't work on all kind of links. The following link gets cut after the first 0... (I love Vignette 😉:
http://www.spiegel.de/unispiegel/jobundberuf/0,1518,220964,00.html
This forum regexps the link correctly but not if it would have been:
www.spiegel.de/unispiegel/jobundberuf/0,1518,220964,00.html
Does anyone have an improved version of link_urls?
Thanks
Take a look at
packages/acs-tcl/tcl/text-html-procs.tcl
in the function ad_text_to_html for the regexps used
in 4.x to highlight links. It should be pretty easy to fix.
packages/acs-tcl/tcl/text-html-procs.tcl
in the function ad_text_to_html for the regexps used
in 4.x to highlight links. It should be pretty easy to fix.
Hello Jeff,
I have limited experience with regexps and just couldn't get ad_text_to_html display www.foo.com correctly without destroying http://www.foo.com items...
Any hints?
Here the proc
I have limited experience with regexps and just couldn't get ad_text_to_html display www.foo.com correctly without destroying http://www.foo.com items...
Any hints?
Here the proc
ad_proc -public ad_text_to_html {
-no_links:boolean
text
} {
Converts plaintext to html. Also translates any recognized
email addresses or URLs into a hyperlink.
@param no_links will prevent it from highlighting
@author Branimir Dolicki (branimir@arsdigita.com)
@author Lars Pind (lars@pinds.com)
@creation-date 19 July 2000
} {
if { !$no_links_p } {
# We start by putting a space in front so our URL/email highlighting will work
# for URLs/emails right in the beginning of the text.
set text " $text"
# if something is " http://" or " https://"
# we assume it is a link to an outside source.
# (bd) The only purpose of thiese sTaRtUrL and
# eNdUrL markers is to get rid of trailing dots,
# commas and things like that. Note that there
# is a TAB before and after each marker.
regsub -nocase -all {([^a-zA-Z0-9]+)(http://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
regsub -nocase -all {([^a-zA-Z0-9]+)(https://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
regsub -nocase -all {([^a-zA-Z0-9]+)(ftp://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
# email links have the form xxx@xxx.xxx
# JCD: don't treat things =xxx@xxx.xxx as email since most
# common occurance seems to be in urls (although VPATH bounce
# emails like bounce-user=domain.com@sourcehost.com will then not
# work correctly). It's all quite ugly.
regsub -nocase -all {([^a-zA-Z0-9=]+)(mailto:)?([^=\(\)\s:;,@<>]+@[^\(\)\s.:;,@<>]+[.][^\(\)\s:;,@<>]+)} $text \
"\\1\tsTaRtEmAiL\\3eNdEmAiL\t" text
}
# At this point, before inserting some of our own <, >, and "'s
# we quote the ones entered by the user:
set text [ad_quotehtml $text]
# Convert _single_ CRLF's to <br>'s to preserve line breaks
regsub -all {\r*\n} $text "<br>\n" text
# Convert every two spaces to an nbsp
regsub -all { } $text "\\\ " text
# turn CRLFCRLF into <P>
if { [regsub -all {\r\n\s*\r\n} $text "<p>" text] == 0 } {
# try LFLF
if { [regsub -all {\n\s*\n} $text "<p>" text] == 0 } {
# try CRCR
regsub -all {\r\s*\r} $text "<p>" text
}
}
if { !$no_links_p } {
# Dress the links and emails with A HREF
regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdUrL\t)} $text {\2\1} text
regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdEmAiL\t)} $text {\2\1} text
regsub -all {\tsTaRtUrL([^\t]*)eNdUrL\t} $text {<a href="\1">\1</a>} text
regsub -all {\tsTaRtEmAiL([^\t]*)eNdEmAiL\t} $text {<a href="mailto:\1">\1</a>} text
set text [string trimleft $text]
}
# Convert every tab to 4 nbsp's
regsub -all {\t} $text {\ \ \ \ } text
# JCD: Remove all the eNd sTaRt stuff and warn if we do it since its bad
# to have these left (means something is broken in our regexps above)
if {[regsub -all {(sTaRtUrL|eNdUrL|sTaRtEmAiL|eNdEmAiL)} $text {} text]} {
ns_log warning "Replaced sTaRt/eNd magic tags in ad_text_to_html"
}
return $text
}
yuuuhu,
I've got it:
I've got it:
ad_proc -public ad_text_to_html {
-no_links:boolean
text
} {
Converts plaintext to html. Also translates any recognized
email addresses or URLs into a hyperlink.
@param no_links will prevent it from highlighting
@author Branimir Dolicki (branimir@arsdigita.com)
@author Lars Pind (lars@pinds.com)
@creation-date 19 July 2000
} {
if { !$no_links_p } {
# We start by putting a space in front so our URL/email highlighting will work
# for URLs/emails right in the beginning of the text.
set text " $text"
# if something is " http://" or " https://"
# we assume it is a link to an outside source.
# (bd) The only purpose of thiese sTaRtUrL and
# eNdUrL markers is to get rid of trailing dots,
# commas and things like that. Note that there
# is a TAB before and after each marker.
regsub -nocase -all {([^a-zA-Z0-9]+)((http|https|ftp)://[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrL\\2eNdUrL\t" text
regsub -nocase -all {([^a-zA-Z0-9/]+)(www\.[^\(\)"<>\s]+)} $text "\\1\tsTaRtUrLnOhTtP\\2eNdUrLnOhTtP\t" text
# email links have the form xxx@xxx.xxx
# JCD: don't treat things =xxx@xxx.xxx as email since most
# common occurance seems to be in urls (although VPATH bounce
# emails like bounce-user=domain.com@sourcehost.com will then not
# work correctly). It's all quite ugly.
regsub -nocase -all {([^a-zA-Z0-9=]+)(mailto:)?([^=\(\)\s:;,@<>]+@[^\(\)\s.:;,@<>]+[.][^\(\)\s:;,@<>]+)} $text \
"\\1\tsTaRtEmAiL\\3eNdEmAiL\t" text
}
# At this point, before inserting some of our own <, >, and "'s
# we quote the ones entered by the user:
set text [ad_quotehtml $text]
# Convert _single_ CRLF's to <br>'s to preserve line breaks
regsub -all {\r*\n} $text "<br>\n" text
# Convert every two spaces to an nbsp
regsub -all { } $text "\\\ " text
# turn CRLFCRLF into <P>
if { [regsub -all {\r\n\s*\r\n} $text "<p>" text] == 0 } {
# try LFLF
if { [regsub -all {\n\s*\n} $text "<p>" text] == 0 } {
# try CRCR
regsub -all {\r\s*\r} $text "<p>" text
}
}
if { !$no_links_p } {
# Dress the links and emails with A HREF
regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdUrL\t)} $text {\2\1} text
regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdUrLnOhTtP\t)} $text {\2\1} text
regsub -all {([]!?.:;,<>\(\)\}"'-]+)(eNdEmAiL\t)} $text {\2\1} text
regsub -all {\tsTaRtUrL([^\t]*)eNdUrL\t} $text {<a href="\1">\1} text
regsub -all {\tsTaRtUrLnOhTtP([^\t]*)eNdUrLnOhTtP\t} $text {<a href="http://\1">\1</a>} text
regsub -all {\tsTaRtEmAiL([^\t]*)eNdEmAiL\t} $text {<a href="mailto:\1">\1</a>} text
set text [string trimleft $text]
}
# Convert every tab to 4 nbsp's
regsub -all {\t} $text {\ \ \ \ } text
# JCD: Remove all the eNd sTaRt stuff and warn if we do it since its bad
# to have these left (means something is broken in our regexps above)
if {[regsub -all {(sTaRtUrL|eNdUrL|sTaRtUrLnOhTtP|eNdUrLnOhTtP|sTaRtEmAiL|eNdEmAiL)} $text {} text]} {
ns_log warning "Replaced sTaRt/eNd magic tags in ad_text_to_html"
}
return $text
}