Forum OpenACS Development: Squeky Clean Markup: The OpenACS Decrufter Proposal

The experience with the new documentation project on (as outlined here by Gustaf: ) seems to be a faint reflection of what I am seeing on our internal installation of xowiki (it is actually much worse on our internal install b/c of the widespread use of Microsoft Word).

I had a similar experience a few years ago with WYSIWYG editors in html forms (it was a large CMS project based on Zope I started that was eventually replaced with Typo) and went through a phase of wysiwygafobia as a result of the experience (that xowiki has helped me overcome).

Summary: Xinha is much better than what was avaiable back then, but it still produces suboptimal html that no one has time to clean up (the computer should be doing this).

The main issues I see right now (please add any that come to mind):

1. Lots of messy contributions with non-semantic markup
2. Very very ugly html
3. Superfluous tags (including all the moronic markup that is a result of cut and paste of portions of MS Word documents).

I just discovered this Xinha plugin that allows content to be cleaned up by adding a button (it uses a simple php based web interface to HTMLTidy and a few other tools).

In addition to Gustaf's suggestion for style selection of first time input, we need something that catches Microsoft et. al. pasty gunk.

I think we can improve on the SuperClean and reap its benefits in other areas.

I propose a "decrufter" for OpenACS. It should allow for plugins (so we could use it as a front end to a local copy of the W3C validator, HTMLTidy, etc.) and be able to be integrated with form builder (e.g. cleaning up contributions as they are added... i.e. validate the html before it gets to the database) or for an entire request (e.g. allows admin to activate eavesdropping of the request processor with reports of bad html sent as a scheduled proc).

This could be very helpful for accessibility tests related to clean markup for example.

Some related links:

One of the blogs I follow covered something similar a few years ago:

Good place to click around. It would be great if we could reach a point where Xinha produced content is valid xhtml.

Right now I am looking for comments and suggestions so we can start to move in that direction.