- Publicity: Public Only All
text-html-procs.tcl
Contains procs used to manipulate chunks of text and html, most notably converting between them.
- Location:
- packages/acs-tcl/tcl/text-html-procs.tcl
- Created:
- 19 July 2000
- Author:
- Lars Pind <lars@pinds.com>
- CVS Identification:
$Id: text-html-procs.tcl,v 1.113 2024/10/27 16:51:11 gustafn Exp $
Procedures in this file
- ad_convert_to_html (public, deprecated)
- ad_convert_to_text (public, deprecated)
- ad_dom_sanitize_html (public)
- ad_enhanced_text_to_html (public)
- ad_enhanced_text_to_plain_text (public)
- ad_html_qualify_links (public)
- ad_html_security_check (public)
- ad_html_text_convert (public)
- ad_html_text_convertable_p (public, deprecated)
- ad_html_text_convertible_p (public)
- ad_html_to_text (public)
- ad_js_escape (public)
- ad_looks_like_html_p (public)
- ad_pad (public)
- ad_parse_html_attributes (public)
- ad_quotehtml (public, deprecated)
- ad_string_truncate (public)
- ad_string_truncate_middle (public)
- ad_text_to_html (public)
- ad_unquotehtml (public)
- string_truncate (public, deprecated)
- string_truncate_middle (public, deprecated)
- util_close_html_tags (public)
- util_convert_line_breaks_to_html (public)
- util_expand_entities (public, deprecated)
- util_expand_entities_ie_style (public, deprecated)
- util_remove_html_tags (public)
- wrap_string (public, deprecated)
Detailed information
ad_convert_to_html (public, deprecated)
ad_convert_to_html [ -html_p html_p ] text
Deprecated. Invoking this procedure generates a warning.
Convenient interface to convert text or html into html. Does the same as
ad_html_text_convert -to html
.
- Switches:
- -html_p (optional, defaults to
"f"
)- specify
t
if the value oftext
is formatted in HTML, orf
iftext
is plaintext. DEPRECATED: this proc is a trivial wrapper for ad_html_text_convert- Parameters:
- text (required)
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- 19 July 2000
- See Also:
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
ad_convert_to_text (public, deprecated)
ad_convert_to_text [ -html_p html_p ] text
Deprecated. Invoking this procedure generates a warning.
Convenient interface to convert text or html into plaintext. Does the same as
ad_html_text_convert -to text
.
- Switches:
- -html_p (optional, defaults to
"t"
)- specify
t
if the value oftext
is formatted in HTML, orf
iftext
is plaintext. DEPRECATED: this proc is a trivial wrapper for ad_html_text_convert- Parameters:
- text (required)
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- 19 July 2000
- See Also:
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
ad_dom_sanitize_html (public)
ad_dom_sanitize_html -html html [ -allowed_tags allowed_tags ] \ [ -allowed_attributes allowed_attributes ] \ [ -allowed_protocols allowed_protocols ] \ [ -unallowed_tags unallowed_tags ] \ [ -unallowed_attributes unallowed_attributes ] \ [ -unallowed_protocols unallowed_protocols ] [ -no_js ] \ [ -no_outer_urls ] [ -validate ] [ -fix ]
Sanitizes HTML by specified criteria, basically removing unallowed tags and attributes, JavaScript or outer references into page URLs. When desired, this proc can act also as just a validator in order to enforce some markup policies on user-submitted content.
- Switches:
- -html (required)
- the markup to be checked.
- -allowed_tags (optional)
- list of tags we allow in the markup.
- -allowed_attributes (optional)
- list of attributes we allow in the markup.
- -allowed_protocols (optional)
- list of attributes we allow into links
- -unallowed_tags (optional)
- list of tags we don't allow in the markup.
- -unallowed_attributes (optional)
- list of attributes we don't allow in the markup.
- -unallowed_protocols (optional)
- list of protocols we don't allow in the markup. Protocol-relative URLs are allowed, but only if proc is called from a connection thread, as we need to determine our current connection protocol.
- -no_js (optional, boolean)
- this flag decides whether every script tag, inline event handlers and the javascript: pseudo-protocol should be stripped from the markup.
- -no_outer_urls (optional, boolean)
- this flag tells the proc to remove every reference to external addresses. Proc will try to distinguish between external URLs and fine fully specified internal ones. Acceptable URLs will be transformed in absolute local references, others will be just stripped together with the attribute. Absolute URLs referring to our host are allowed, but require the proc being called from a connection thread in order to determine the proper current url.
- -validate (optional, boolean)
- This flag will avoid the creation of the stripped markup and just report whether the original one respects all the specified requirements.
- -fix (optional, boolean)
- When parsing fails on markup as it is, try to fix it by, for example, closing unclosed tags or normalizing attribute specification. This operation will remove most of plain whitespace into text content of original HTML, together with every comment and the eventually present DOCTYPE declaration.
- Returns:
- sanitized markup or a (0/1) truth value when the -validate flag is specified
- Author:
- Antonio Pisano
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_dom_sanitize_html
ad_enhanced_text_to_html (public)
ad_enhanced_text_to_html text
Converts enhanced text format to normal HTML.
- Parameters:
- text (required)
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- 2003-01-27
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_enhanced_text_to_html, ad_html_text_convert, acs_tcl__process_enhanced_correctly
ad_enhanced_text_to_plain_text (public)
ad_enhanced_text_to_plain_text [ -maxlen maxlen ] text
Converts enhanced text format to normal plaintext format.
- Switches:
- -maxlen (optional, defaults to
"70"
)- Parameters:
- text (required)
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- 2003-01-27
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_html_text_convert
ad_html_qualify_links (public)
ad_html_qualify_links [ -location location ] [ -path path ] html
Convert in the HTML text relative URLs into fully qualified URLs including the hostname. It performs the following operations: 1. prepend paths starting with a "/" by the location (protocol and host). 2. prepend paths not starting a "/" by the path, in case it was passed in. Links, which are already fully qualified are not modified.
- Switches:
- -location (optional)
- protocol and host (defaults to [ad_url])
- -path (optional)
- optional path to be prepended to paths not starting with a "/"
- Parameters:
- html (required)
- HTML text, in which substitutions should be performed.
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_html_qualify_links
ad_html_security_check (public)
ad_html_security_check [ -allowed_tags allowed_tags ] \ [ -allowed_attributes allowed_attributes ] \ [ -allowed_protocols allowed_protocols ] html
Returns a human-readable explanation if the user has used any HTML tag other than the allowed ones. It uses for checking the provided values. If these values are not provided the function takes the union of the per-package instance value and the values from the "antispam" section of the kernel parameters.
- Switches:
- -allowed_tags (optional)
- -allowed_attributes (optional)
- -allowed_protocols (optional)
- Parameters:
- html (required)
- The HTML text being validated.
- Returns:
- a human-readable, plaintext explanation of what's wrong with the user's input. If everything is ok, return an empty string.
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- 20 July 2000
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_html_security_check_href_allowed, ad_html_security_check_forbidden_protolcols, ad_html_security_check_forbidden_tags
ad_html_text_convert (public)
ad_html_text_convert [ -from from ] [ -to to ] [ -maxlen maxlen ] \ [ -truncate_len truncate_len ] [ -ellipsis ellipsis ] \ [ -more more ] text
Converts a chunk of text from a variety of formats to either text/html or text/plain.
Example: ad_html_text_convert -from "text/html" -to "text/plain" -- "text"
Putting in the -- prevents Tcl from treating a - in text portion from being treated as a parameter.
Html to html closes any unclosed html tags (see util_close_html_tags).
Text to HTML does ad_text_to_html, and HTML to text does an ad_html_to_text. See those procs for details.
When text is empty, then an empty string will be returned regardless of any format. This is especially useful when displaying content that was created with the richtext widget and might contain empty values for content and format.
- Switches:
- -from (optional, defaults to
"text/plain"
)- specify what type of text you're providing. Allowed values:
- text/plain
- text/enhanced
- text/markdown
- text/fixed-width
- text/html
- -to (optional, defaults to
"text/html"
)- specify what format you want this translated into. Allowed values:
- text/plain
- text/html
- -maxlen (optional, defaults to
"70"
)- The maximum line width when generating text/plain
- -truncate_len (optional, defaults to
"0"
)- The maximum total length of the output, included ellipsis.
- -ellipsis (optional, defaults to
"..."
)- This will get put at the end of the truncated string, if the string was truncated. However, this counts towards the total string length, so that the returned string including ellipsis is guaranteed to be shorter than the 'truncate_len' provided.
- -more (optional)
- This will get put at the end of the truncated string, if the string was truncated.
- Parameters:
- text (required)
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- 19 July 2000
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_html_text_convert, ad_text_html_convert_outlook_word_comments, ad_text_html_convert_to_plain, general_comments_create_link
ad_html_text_convertable_p (public, deprecated)
ad_html_text_convertable_p [ -from from ] [ -to to ]
Deprecated. Invoking this procedure generates a warning.
The name of this proc has an spelling error. Use ad_html_text_convertible_p instead.
- Switches:
- -from (optional)
- -to (optional)
- See Also:
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
ad_html_text_convertible_p (public)
ad_html_text_convertible_p [ -from from ] [ -to to ]
Returns true of ad_html_text_convert can handle the given from and to mime types.
- Switches:
- -from (optional)
- -to (optional)
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_html_text_convert
ad_html_to_text (public)
ad_html_to_text [ -maxlen maxlen ] [ -showtags ] [ -no_format ] html
Returns a best-guess plain text version of an HTML fragment. Parses the HTML and does some simple formatting. The parser and formatting is pretty stupid, but it's better than nothing.
- Switches:
- -maxlen (optional, defaults to
"70"
)- the line length you want your output wrapped to.
- -showtags (optional, boolean)
- causes any unknown (and uninterpreted) tags to get shown in the output.
- -no_format (optional, boolean)
- causes hyperlink tags not to get listed at the end of the output.
- Parameters:
- html (required)
- Authors:
- Lars Pind <lars@pinds.com>
- Aaron Swartz <aaron@swartzfam.com>
- Created:
- 19 July 2000
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- html_to_text, ad_html_to_text_bold, ad_html_to_text_anchor, ad_html_to_text_image, ad_html_to_text_clipped_link, text_to_html
ad_js_escape (public)
ad_js_escape string
Return supplied string with invalid javascript characters property escaped. This makes possible to use the string safely inside javascript code.
- Parameters:
- string (required)
- Author:
- Antonio Pisano
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_js_escape
ad_looks_like_html_p (public)
ad_looks_like_html_p text
Tries to guess whether the text supplied is text or html.
- Parameters:
- text (required)
- the text you want tested.
- Returns:
- 1 if it looks like html, 0 if not.
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- 19 July 2000
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- acs_api_browser_api_describe_function, acs_api_browser_api_proc_documentation, acs_api_browser_api_script_documentation, acs_api_browser_apidoc_format_see, acs_api_browser_apidoc_tclcode_to_html, ad_looks_like_html_p, ad_dimensional
ad_pad (public)
ad_pad [ -left ] [ -right ] string length padstring
Tcl implementation of the pad string function found in many DBMSs. One of the directional flags -left or -right must be specified and will dictate whether this will be a lpad or a rpad.
- Switches:
- -left (optional, boolean)
- text will be appended left of the original string.
- -right (optional, boolean)
- text will be appended right of the original string.
- Parameters:
- string (required)
- length (required)
- padstring (required)
- Returns:
- padded string
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_pad
ad_parse_html_attributes (public)
ad_parse_html_attributes [ -attribute_array attribute_array ] html \ [ pos ]
This is a wrapper proc for
ad_parse_html_attributes_upvar
, so you can parse attributes from a string without upvar'ing. See the documentation for the other proc.
- Switches:
- -attribute_array (optional)
- Parameters:
- html (required)
- pos (optional, defaults to
"0"
)- Author:
- Lars Pind <lars@pinds.com>
- Created:
- November 10, 2000
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_parse_html_attributes
ad_quotehtml (public, deprecated)
ad_quotehtml arg
Deprecated. Invoking this procedure generates a warning.
Quotes ampersands, double-quotes, and angle brackets in $arg. Analogous to ns_quotehtml except that it quotes double-quotes (which ns_quotehtml does not).
- Parameters:
- arg (required)
- See Also:
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
ad_string_truncate (public)
ad_string_truncate [ -len len ] [ -ellipsis ellipsis ] [ -more more ] \ [ -equal ] string
Truncates a string to len characters adding the string provided in the ellipsis parameter if the string was truncated. The length of the resulting string, including the ellipsis, is guaranteed to be shorter or equal than the len specified. Should always be called as ad_string_truncate [-flags ...] -- string since otherwise strings which start with a - will treated as switches, and will cause an error.
- Switches:
- -len (optional, defaults to
"200"
)- The length to truncate to. If zero, no truncation will occur.
- -ellipsis (optional, defaults to
"..."
)- This will get put at the end of the truncated string, if the string was truncated. However, this counts towards the total string length, so that the returned string including ellipsis is guaranteed to be shorter or equal than the 'len' provided.
- -more (optional)
- This will get put at the end of the truncated string, if the string was truncated.
- -equal (optional, boolean)
- Parameters:
- string (required)
- The string to truncate.
- Returns:
- The truncated string
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- September 8, 2002
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_string_truncate
ad_string_truncate_middle (public)
ad_string_truncate_middle [ -ellipsis ellipsis ] [ -len len ] string
Cut middle part of a string in case it is too long.
- Switches:
- -ellipsis (optional, defaults to
"..."
)- placeholder for the portion of text being left out
- -len (optional, defaults to
"100"
)- length after which we are starting cutting text
- Parameters:
- string (required)
- Returns:
- truncated string
- See Also:
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_string_truncate_middle
ad_text_to_html (public)
ad_text_to_html [ -no_links ] [ -no_lines ] [ -no_quote ] \ [ -includes_html ] [ -encode ] text
Converts plaintext to html. Also translates any recognized email addresses or URLs into a hyperlink.
- Switches:
- -no_links (optional, boolean)
- will prevent it from highlighting
- -no_lines (optional, boolean)
- -no_quote (optional, boolean)
- will prevent it from HTML-quoting output, so this can be run on semi-HTML input and preserve that formatting. This will also cause spaces/tabs to not be replaced with nbsp's, because this can too easily mess up HTML tags.
- -includes_html (optional, boolean)
- Set this if the text parameter already contains some HTML which should be preserved.
- -encode (optional, boolean)
- This will encode international characters into its html equivalent, like "ü" into ü
- Parameters:
- text (required)
- Authors:
- Branimir Dolicki <branimir@arsdigita.com>
- Lars Pind <lars@pinds.com>
- Created:
- 19 July 2000
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- ad_text_to_html, xowiki_test_cases, create_form_with_form_instance
ad_unquotehtml (public)
ad_unquotehtml arg
reverses ns_quotehtml
- Parameters:
- arg (required)
- See Also:
- ns_quotehtml
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- quote_unquote_html
string_truncate (public, deprecated)
string_truncate [ args... ]
Deprecated. Invoking this procedure generates a warning.
Truncates a string to len characters adding the string provided in the ellipsis parameter if the string was truncated. The length of the resulting string, including the ellipsis, is guaranteed to be shorter or equal than the len specified. Should always be called as ad_string_truncate [-flags ...] -- string since otherwise strings which start with a - will treated as switches, and will cause an error.
- Returns:
- The truncated string
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- September 8, 2002 DEPRECATED: does not comply with OpenACS naming convention
- See Also:
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
string_truncate_middle (public, deprecated)
string_truncate_middle [ args... ]
Deprecated. Invoking this procedure generates a warning.
Cut middle part of a string in case it is too long DEPRECATED: does not comply with OpenACS naming convention
- See Also:
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
util_close_html_tags (public)
util_close_html_tags html_fragment [ break_soft ] [ break_hard ] \ [ ellipsis ] [ more ]
Given an HTML fragment, this procedure will close any tags that have been left open. The optional arguments let you specify that the fragment is to be truncated to a certain number of displayable characters. After break_soft, it truncates and closes open tags unless you're within non-breaking tags (e.g., Af). After break_hard displayable characters, the procedure simply truncates and closes any open HTML tags that might have resulted from the truncation.
Note that the internal syntax table dictates which tags are non-breaking. The syntax table has codes:
- nobr -- treat tag as nonbreaking.
- discard -- throws away everything until the corresponding close tag.
- remove -- nuke this tag and its closing tag but leave contents.
- close -- close this tag if left open.
- Parameters:
- html_fragment (required)
- break_soft (optional, defaults to
"0"
)- the number of characters you want the HTML fragment truncated to. Will allow certain tags (A, ADDRESS, NOBR) to close first.
- break_hard (optional, defaults to
"0"
)- the number of characters you want the HTML fragment truncated to. Will truncate, regardless of what tag is currently in action.
- ellipsis (optional)
- This will get put at the end of the truncated string, if the string was truncated. However, this counts towards the total string length, so that the returned string including ellipsis is guaranteed to be shorter than the 'len' provided.
- more (optional)
- This will get put at the end of the truncated string, if the string was truncated.
- Author:
- Jeff Davis <davis@xarg.net>
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- util_close_html_tags
util_convert_line_breaks_to_html (public)
util_convert_line_breaks_to_html [ -includes_html ] [ -contains_pre ] \ text
Convert line breaks to <p> and <br> tags, respectively.
- Switches:
- -includes_html (optional, boolean)
- -contains_pre (optional, boolean)
- Parameters:
- text (required)
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- util_convert_line_breaks_to_html, ad_text_to_html
util_expand_entities (public, deprecated)
util_expand_entities html
Deprecated. Invoking this procedure generates a warning.
Replaces all occurrences of common HTML entities with their plaintext equivalents in a way that's appropriate for pretty-printing.
Currently, the following entities are converted: <, >, &apm;quot;, &, — and —.
This proc is more suitable for pretty-printing that its sister-proc,
util_expand_entities_ie_style
. The two differences are that this one is more strict: it requires proper entities i.e., both opening ampersand and closing semicolon, and it doesn't do numeric entities, because they're generally not safe to send to browsers. If we want to do numeric entities in general, we should also consider how they interact with character encodings.
- Parameters:
- html (required)
- See Also:
- ns_unquotehtml
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
util_expand_entities_ie_style (public, deprecated)
util_expand_entities_ie_style html
Deprecated. Invoking this procedure generates a warning.
Replaces all occurrences of o and &x0f; type HTML character entities to their ASCII equivalents. It also handles lt, gt, quot, ob, cb and amp.
This proc does the expansion in the style of IE and Netscape, which is to say that it doesn't require the trailing semicolon on the entity to replace it with something else. The reason we do that is that this proc was designed for checking HTML for security-issues, and since entities can be used for hiding malicious code, we'd better simulate the liberal interpretation that browsers does, even though it complicates matters.
Unlike its sister proc,
util_expand_entities
, it also expands numeric entities (#999 or #xff style).
- Parameters:
- html (required)
- Author:
- Lars Pind <lars@pinds.com>
- Created:
- October 17, 2000
- See Also:
- ns_unquotehtml
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.
util_remove_html_tags (public)
util_remove_html_tags html
Removes everything between < and > from the string.
- Parameters:
- html (required)
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- util_remove_html_tags
wrap_string (public, deprecated)
wrap_string input [ width ]
Deprecated. Invoking this procedure generates a warning.
wraps a string to be no wider than 80 columns by inserting line breaks
- Parameters:
- input (required)
- width (optional, defaults to
"80"
)- See Also:
- ns_reflow_text
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- No testcase defined.