Home
The Toolkit for Online Communities
17613 Community Members, 0 members online, 2841 visitors today
Log In Register
OpenACS Home : ACS API Browser : ACS Tcl 5.5.1 : text-html-procs.tcl
Publicity:
[Public Only | All]

text-html-procs.tcl

Contains procs used to manipulate chunks of text and html, most notably converting between them.
Location:
packages/acs-tcl/tcl/text-html-procs.tcl
Created:
19 July 2000
Author:
Lars Pind <lars@pinds.com>
CVS Identification:
$Id: text-html-procs.tcl,v 1.65.6.14 2014/09/09 06:30:21 gustafn Exp $

Procedures in this file

Detailed information

ad_convert_to_html (public)

 ad_convert_to_html [ -html_p html_p ] text
Convenient interface to convert text or html into html. Does the same as ad_html_text_convert -to html.

Switches:
-html_p (defaults to "f") (optional)
specify t if the value of text is formatted in HTML, or f if text is plaintext.
Parameters:
text
Author:
Lars Pind <lars@pinds.com>
Created:
19 July 2000
 

ad_convert_to_text (public)

 ad_convert_to_text [ -html_p html_p ] text
Convenient interface to convert text or html into plaintext. Does the same as ad_html_text_convert -to text.

Switches:
-html_p (defaults to "t") (optional)
specify t if the value of text is formatted in HTML, or f if text is plaintext.
Parameters:
text
Author:
Lars Pind <lars@pinds.com>
Created:
19 July 2000
 

ad_enhanced_text_to_html (public)

 ad_enhanced_text_to_html text
Converts enhanced text format to normal HTML.

Parameters:
text
Author:
Lars Pind <lars@pinds.com>
Created:
2003-01-27
 

ad_enhanced_text_to_plain_text (public)

 ad_enhanced_text_to_plain_text [ -maxlen maxlen ] text
Converts enhanced text format to normal plaintext format.

Switches:
-maxlen (defaults to "70") (optional)
Parameters:
text
Author:
Lars Pind <lars@pinds.com>
Created:
2003-01-27
 

ad_html_security_check (public)

 ad_html_security_check html
Returns a human-readable explanation if the user has used any HTML tag other than the ones marked allowed in antispam section of ad.ini. Otherwise returns an empty string.

Parameters:
html
Returns:
a human-readable, plaintext explanation of what's wrong with the user's input.
Author:
Lars Pind <lars@pinds.com>
Created:
20 July 2000
 

ad_html_text_convert (public)

 ad_html_text_convert [ -from from ] [ -to to ] [ -maxlen maxlen ] \
    [ -truncate_len truncate_len ] [ -ellipsis ellipsis ] \
    [ -more more ] text
Converts a chunk of text from a variety of formats to either text/html or text/plain.

Example: ad_html_text_convert -from "text/html" -to "text/plain" -- "text"

Putting in the -- prevents Tcl from treating a - in text portion from being treated as a parameter.

Html to html closes any unclosed html tags (see util_close_html_tags).

Text to html does ad_text_to_html, and html to text does a ad_html_to_text. See those procs for details.

When text is empty, then an empty string will be returned regardless of any format. This is especially useful when displaying content that was created with the richtext widget and might contain empty values for content and format.

Switches:
-from (defaults to "text/plain") (optional)
specify what type of text you're providing. Allowed values:
  • text/plain
  • text/enhanced
  • text/fixed-width
  • text/html
-to (defaults to "text/html") (optional)
specify what format you want this translated into. Allowed values:
  • text/plain
  • text/html
-maxlen (defaults to "70") (optional)
The maximum line width when generating text/plain
-truncate_len (defaults to "0") (optional)
The maximum total length of the output, included ellipsis.
-ellipsis (defaults to "...") (optional)
This will get put at the end of the truncated string, if the string was truncated. However, this counts towards the total string length, so that the returned string including ellipsis is guaranteed to be shorter than the 'truncate_len' provided.
-more (optional)
This will get put at the end of the truncated string, if the string was truncated.
Parameters:
text
Author:
Lars Pind <lars@pinds.com>
Created:
19 July 2000
 

ad_html_text_convertable_p (public)

 ad_html_text_convertable_p [ -from from ] [ -to to ]
Returns true of ad_html_text_convert can handle the given from and to mime types.

Switches:
-from (optional)
-to (optional)
 

ad_html_to_text (public)

 ad_html_to_text [ -maxlen maxlen ] [ -showtags ] [ -no_format ] html
Returns a best-guess plain text version of an HTML fragment. Parses the HTML and does some simple formatting. The parser and formatting is pretty stupid, but it's better than nothing.

Switches:
-maxlen (defaults to "70") (optional)
the line length you want your output wrapped to.
-showtags (boolean) (optional)
causes any unknown (and uninterpreted) tags to get shown in the output.
-no_format (boolean) (optional)
causes hyperlink tags not to get listed at the end of the output.
Parameters:
html
Authors:
Lars Pind <lars@pinds.com>
Aaron Swartz <aaron@swartzfam.com>
Created:
19 July 2000
 

ad_looks_like_html_p (public)

 ad_looks_like_html_p text
Tries to guess whether the text supplied is text or html.

Parameters:
text - the text you want tested.
Returns:
1 if it looks like html, 0 if not.
Author:
Lars Pind <lars@pinds.com>
Created:
19 July 2000
 

ad_parse_html_attributes (public)

 ad_parse_html_attributes [ -attribute_array attribute_array ] html \
    [ pos ]
This is a wrapper proc for ad_parse_html_attributes_upvar, so you can parse attributes from a string without upvar'ing. See the documentation for the other proc.

Switches:
-attribute_array (optional)
Parameters:
html
pos (defaults to "0")
Author:
Lars Pind <lars@pinds.com>
Created:
November 10, 2000
 

ad_parse_html_attributes_upvar (public)

 ad_parse_html_attributes_upvar [ -attribute_array attribute_array ] \
    html_varname pos_varname
Parse attributes in an HTML fragment and return them as a list of lists.

Each element of that list is either a single element, if the attribute had no value, or a two-tuple, with the first element being the name of the attribute and the second being the value. The attribute names are all converted to lowercase.

If you don't really care what happens when the same attribute is present twice, you can also use the attribute_array argument, and the attributes will be set there. For attributes without any value, we'll use the empty string.

Example:

set html {<tag foo = bar baz greble="&quot;hello you sucker&quot;" foo='blah' Heres = '  something for   you to = "consider" '>}
set pos 5 ; # the 'f' in the first 'foo'

set attribute_list [ad_parse_html_attributes_upvar -attribute_array attribute_array html pos]
attribute_list will contain the following:
{foo bar} baz {greble {"hello you sucker"}} {foo blah} {heres {  something for   you to = "consider" }}
attribute_array will contain:
attribute_array(foo)='blah'
attribute_array(greble)='"hello you sucker"'
attribute_array(baz)=''
attribute_array(heres)='  something for   you to = "consider" '

Won't alter the string passed in .. promise! We will modify pos_var. Pos_var should point to the first character inside the tag, after the tag name (we don't care if you let if there's some whitespace before the first attribute)

Switches:
-attribute_array (optional)
This is an alternate way of returning the attributes, if you don't care about what happens when the same attribute name is defined twice.
Parameters:
html_varname - the name of the variable holding the HTML fragment. We promise that we won't change the contents of this variable.
pos_varname - the name of the variable holding the position within the html_varname string from which we should start. This should point to a character inside the tag, just after the tag name, and before the first attribute. Note, that we will modify this variable. When this proc is done, this variable will point to the tag-closing >. Example: if the tag is <img src="foo">, pos_varname should point to either the space between img and src, or the s in src.
Returns:
A list of list holding the attribute names and values. Each element of that list is either a single element, if the attribute had no value, or a two-tuple, with the first element being the name of the attribute and the second being the value. The attribute names are all converted to lowercase.
Author:
Lars Pind <lars@pinds.com>
Created:
November 10, 2000
 

ad_quotehtml (public)

 ad_quotehtml arg
Quotes ampersands, double-quotes, and angle brackets in $arg. Analogous to ns_quotehtml except that it quotes double-quotes (which ns_quotehtml does not).

Parameters:
arg

See Also:
 

ad_text_to_html (public)

 ad_text_to_html [ -no_links ] [ -no_lines ] [ -no_quote ] \
    [ -includes_html ] [ -encode ] text
Converts plaintext to html. Also translates any recognized email addresses or URLs into a hyperlink.

Switches:
-no_links (boolean) (optional)
will prevent it from highlighting
-no_lines (boolean) (optional)
-no_quote (boolean) (optional)
will prevent it from HTML-quoting output, so this can be run on semi-HTML input and preserve that formatting. This will also cause spaces/tabs to not be replaced with nbsp's, because this can too easily mess up HTML tags.
-includes_html (boolean) (optional)
Set this if the text parameter already contains some HTML which should be preserved.
-encode (boolean) (optional)
This will encode international characters into it's html equivalent, like "ΓΌ" into ü
Parameters:
text
Authors:
Branimir Dolicki <branimir@arsdigita.com>
Lars Pind <lars@pinds.com>
Created:
19 July 2000
 

ad_unquotehtml (public)

 ad_unquotehtml arg
reverses ad_quotehtml

Parameters:
arg

See Also:
 

philg_quote_double_quotes (public, decprecated)

 philg_quote_double_quotes arg
Deprecated. Invoking this procedure generates a warning.

This proc does exactly the same as ad_quotehtml. Use that instead. This one will be deleted eventually.

Parameters:
arg

See Also:
 

string_truncate (public)

 string_truncate [ -len len ] [ -ellipsis ellipsis ] [ -more more ] \
    string
Truncates a string to len characters (defaults to the parameter TruncateDescriptionLength), adding the string provided in the ellipsis parameter if the string was truncated. If format is html (default), any open HTML tags are closed. Otherwise, it's converted to text using ad_html_to_text. The length of the resulting string, including the ellipsis, is guaranteed to be within the len specified. Should always be called as string_truncate [-flags ...] -- string since otherwise strings which start with a - will treated as switches, and will cause an error.

Switches:
-len (defaults to "200") (optional)
The lenght to truncate to. If zero, no truncation will occur.
-ellipsis (defaults to "...") (optional)
This will get put at the end of the truncated string, if the string was truncated. However, this counts towards the total string length, so that the returned string including ellipsis is guaranteed to be shorter than the 'len' provided.
-more (optional)
This will get put at the end of the truncated string, if the string was truncated.
Parameters:
string - The string to truncate.
Returns:
The truncated string, with HTML tags cloosed or converted to text, depending on format.
Author:
Lars Pind <lars@pinds.com>
Created:
September 8, 2002
 

util_convert_line_breaks_to_html (public)

 util_convert_line_breaks_to_html [ -includes_html ] text
Convert line breaks to

and
tags, respectively.

Switches:
-includes_html (boolean) (optional)
Parameters:
text
 

util_convert_plaintext_to_html (public, decprecated)

 util_convert_plaintext_to_html raw_string
Deprecated. Invoking this procedure generates a warning.

Almost everything this proc does can be accomplished with the ad_text_to_html. Use that proc instead.

Only difference is that ad_text_to_html doesn't check to see if the plaintext might in fact be HTML already by mistake. But we usually don't want that anyway, because maybe the user wanted a <p> tag in his plaintext. We'd rather let the user change our opinion about the text, e.g. html_p = 't'.

Parameters:
raw_string

See Also:
 

util_expand_entities (public)

 util_expand_entities html
Replaces all occurrences of common HTML entities with their plaintext equivalents in a way that's appropriate for pretty-printing.

Currently, the following entities are converted: &lt;, &gt;, &apm;quot;, &amp;, &mdash; and &#151;.

This proc is more suitable for pretty-printing that it's sister-proc, util_expand_entities_ie_style. The two differences are that this one is more strict: it requires proper entities i.e., both opening ampersand and closing semicolon, and it doesn't do numeric entities, because they're generally not safe to send to browsers. If we want to do numeric entities in general, we should also consider how they interact with character encodings.

Parameters:
html
 

util_expand_entities_ie_style (public)

 util_expand_entities_ie_style html
Replaces all occurrences of &#111; and &x0f; type HTML character entities to their ASCII equivalents. It also handles lt, gt, quot, ob, cb and amp.

This proc does the expansion in the style of IE and Netscape, which is to say that it doesn't require the trailing semicolon on the entity to replace it with something else. The reason we do that is that this proc was designed for checking HTML for security-issues, and since entities can be used for hiding malicious code, we'd better simulate the liberal interpretation that browsers does, even though it complicates matters.

Unlike it's sister proc, util_expand_entities, it also expands numeric entities (#999 or #xff style).

Parameters:
html
Author:
Lars Pind <lars@pinds.com>
Created:
October 17, 2000
 

util_maybe_convert_to_html (public, decprecated)

 util_maybe_convert_to_html raw_string html_p
Deprecated. Invoking this procedure generates a warning.

This proc is deprecated. Use ad_convert_to_html instead.

Parameters:
raw_string
html_p

See Also:
 

util_quote_double_quotes (public, decprecated)

 util_quote_double_quotes arg
Deprecated. Invoking this procedure generates a warning.

This proc does exactly the same as ad_quotehtml. Use that instead. This one will be deleted eventually.

Parameters:
arg

See Also:
 

util_quotehtml (public, decprecated)

 util_quotehtml arg
Deprecated. Invoking this procedure generates a warning.

This proc does exactly the same as ad_quotehtml. Use that instead. This one will be deleted eventually.

Parameters:
arg

See Also:
 

util_remove_html_tags (public)

 util_remove_html_tags html
Removes everything between < and > from the string.

Parameters:
html
 

util_striphtml (public, decprecated)

 util_striphtml html
Deprecated. Invoking this procedure generates a warning.

Deprecated. Use ad_html_to_text instead.

Parameters:
html

See Also:
 

wrap_string (public)

 wrap_string input [ threshold ]
wraps a string to be no wider than 80 columns by inserting line breaks

Parameters:
input
threshold (defaults to "80")
 

[ show source ]