Carl, here is some feedback mostly about the tags,
i have not looked at the attributes in detail.
we should allow tags from html 4.01 excluding these
- programmatical elements (forms, form elements, applet, object, map, script/noscript),
- head elements (link, title, meta, ...),
- A and
- but as well the STYLE attribute.
The XSS page lists e.g.
<DIV STYLE="width: expression(alert('XSS'));">
which are dangerous for some browsers.
So, the STYLE attribute is dangerous and should be handled with care (e.g. not in the default configuration).
here is a slightly completed and sorted list of HTML 4.01 elements:
abbr acronym address b big blockquote br caption cite code col
colgroup dd del dfn div dl dt em fieldset font h1 h2 h3 h4 h5 h6 hr i
ins kbd legend li ol p pre q s samp small span strike strong sub sup
table tbody td tfoot th thead tr tt u ul var
i am not sure if we should allow the ms office tags in the web pages, since these will cause errors on HTML conformance tests.
In general, we should distinguish between public content (so special rights are required to provide HTML content) like in a public forum, where a conservative policy is required, and somewhat trusted and known content developers, where a more liberal policy can be used.
For the general user i would ask myself, why do we want to allow e.g. CLASS, STYLE or ID, what do we gain by doing so. The general policy should stay conservative.