Forum OpenACS Development: Re: Re: Re: Re: Re: Saving Javascript and CSS in the HTML area of Widget

This being so hard to find reminds me that we should probably update the default lists of allowed tags and attributes (they are pretty crusty).

Here is the list from the 5.2 branch for review:

TAGS:
A ADDRESS B BLOCKQUOTE BR CODE DIV DD DL DT EM FONT HR I LI OL P PRE SPAN STRIKE STRONG SUB SUP TABLE TBODY TD TR TT U UL EMAIL FIRST_NAMES LAST_NAME GROUP_NAME H1 H2 H3 H4 H5 H6

ATTRIBUTES:
align alt border cellpadding cellspacing color face height href hspace id name size src style target title valign vspace width

Here is an updated list from one of the more conservative OpenACS servers I run:

TAGS:
a abbr acronym address b big blockquote br caption cite code col colgroup del dfn div dd dl dt em font h1 h2 h3 h4 h5 h6 hr i li ins ol p pre q s samp span strike strong sub sup table tbody td tfoot th thead tr tt u ul var EMAIL FIRST_NAME LAST_NAME GROUP_NAME

ATTRIBUTES:
abbr align alt axis bgcolor border cellpadding cellspacing char charoff charset cite class classid clear color colspan datetime dir face frame headers height href hreflang hspace id longdesc name rel rev rowspan rules scope size src style target title type valign vspace width

Anyone see any problems with these two lists? Any additions? We should update these before the next release imho.

Carl, here is some feedback mostly about the tags,
i have not looked at the attributes in detail.

we should allow tags from html 4.01 excluding these
- programmatical elements (forms, form elements, applet, object, map, script/noscript),
- head elements (link, title, meta, ...),
- frames
- style

Potentially dangerous are tags/attributes allowing urls, which have to be checked to disallow javascript: and which point to untrusted sites. oacs already checks for untrusted protocols, but it is very hard to do this everywhere (e.g. parsing inline styles). so, potential dangerous are
- A and
- IMG,
- but as well the STYLE attribute.

The XSS page lists e.g.

<>DIV STYLE="background-image: url(javascript:alert('XSS'))">

or

<DIV STYLE="width: expression(alert('XSS'));">

which are dangerous for some browsers.

So, the STYLE attribute is dangerous and should be handled with care (e.g. not in the default configuration).

Other attributes like e.g. CLASS can be used to confuse the user (e.g. using style elements from the navigation) or might break code (using ID, when javascript elements of oacs search for IDs and find unexpected occurrences)

here is a slightly completed and sorted list of HTML 4.01 elements:

abbr acronym address b big blockquote br caption cite code col
colgroup dd del dfn div dl dt em fieldset font h1 h2 h3 h4 h5 h6 hr i
ins kbd legend li ol p pre q s samp small span strike strong sub sup
table tbody td tfoot th thead tr tt u ul var

i am not sure if we should allow the ms office tags in the web pages, since these will cause errors on HTML conformance tests.

In general, we should distinguish between public content (so special rights are required to provide HTML content) like in a public forum, where a conservative policy is required, and somewhat trusted and known content developers, where a more liberal policy can be used.

For the general user i would ask myself, why do we want to allow e.g. CLASS, STYLE or ID, what do we gain by doing so. The general policy should stay conservative.