Forum OpenACS Development: Re: Checks for HTML standard compliance

Collapse
Posted by Antonio Pisano on
External plugins validating pages while browsing exist, but I need to find one which could fit my needs:
reliable, effective and FOSS if possible. Still looking.

The idea to include a feature in the server could be neat though! It could be put into Development Support and turned on when needed.

Ideally, once turned on, server should feed all page requests to a configured validator and report issues in a log file or a special db table which could then be inspected.

Report should include in my opinion: page URL and parameters, full HTML of the request, line/row of the issue, description of the issue.

I don't know about similar feature in platforms around, but my knowledge could just be limited. Anyway this could end up really benefical!

Collapse
Posted by Benjamin Brink on
Antonio,

The WDG html validator meets the FOSS requirement and might meet the other requirements, too.

The previous link explains difference between W3C html validator and WDG one.

Here is a link to the license message and code:

http://htmlhelp.org/tools/validator/source.html.en

Since it is written in perl, perhaps it will work with nscgi.

cheers,

Collapse
Posted by Benjamin Brink on
I haven't learned xotcl2/nsf2 but..

I bet there's a practical way to read a DTD into it by building an object that can validate a page against it.

--This would keep the code entirely in openacs and be a useful, working example of some of its strengths.

Collapse
Posted by Antonio Pisano on
A tcl/xotcl brand new validator is something I didn't want to put in the mix for now, as it would require efforts and many tests before we can rely on it. Unless someone says and proves it is very easy...

I think I can come up with something that does what I proposed, but I need a little tip.

Resuming my proposal:
into Developer Support I would put a new feature which could be turned on and off at will. When turned on, for every request which mime/type is 'text/html' (or other kind of markup subject to validation, but we could start just with HTML/XHTML for now) produced text will be sent to a validation command. Such command could be a callback or be specified by a parameter, what matters is that it is capable of telling if page is invalid and where/why.

The result of the validation will then be logged into a table. There will be a UI to inspect/manage this logging into DS and also to configure what to log. We could, for example, log only request issued by a certain user, or into a certain package, or on a single page, so one could test a page without putting server to its knees.

I would log:
- user issuing request
- timestamp
- package_key of the request
- package version
- url
- page parameters (GET/POST)
- actual script being executed
- page content
- row/column of validation error
- description of the error

I know some of this information is already logged/collected by the request processor, but I still don't understand where exactly.

What I need to know from someone with some more insights is:
- where can I intercept page content just before it is served?
- are there already tables collecting some of the stats I described?
- is this something worth trying in the first place??

Collapse
Posted by Michael Aram on
Hi Antonio,

I have had practically exactly the same feature in mind for quite some time now (extending the DS with a toggle that [dis|en]ables (X)HTML validation of the generated output). Therefore, I naturally do second your proposal in general, although I think that overall this does not have a very high priority for the OpenACS project (especially if in-browser plugins exists that basically do the same job).

All the best,
Michael

Collapse
Posted by Gustaf Neumann on
building an own validator - no matter in what language - is a longer project on its own and not feasible.

One of the best HTML validators is from w3c, available as a service [1] and as a standalone program [2]. The service is very nice for many applications, but comes short when authentication is required (such as on admin pages). However, the resulting HTML file can be saved from the browser, and the standalone version works on this.

Note that this just validates the HTML code, flagging invalid and deprecated markup, but it does not flag e.g. tables. For the acs-admin/apm page, the tag "center" is deprecated, but table is not. The change [3] makes this page validating (4.01 strict). For HTML 5, it flags 12 errors (e.g. don't use "align" in tables, use CSS instead, etc.) for this page. In a further step it would be nice to pull out some of the style attributes and place these to a common CSS file.

The examples of acs-admin/apm and acs-admin/developer show that it is very hard to come up with an automated change script to fix table based designs, since different techniques are necessary for centering, right-aligning, etc. depending on the context.

-g

[1] https://validator.w3.org/
[2] https://validator.w3.org/source/
[3] http://cvs.openacs.org/changelog/OpenACS?cs=MAIN%3Agustafn%3A20150529090902

Collapse
Posted by Antonio Pisano on
Ok, so you say we can just save HTML from the browser, it's fine for me, I tend to overdesign sometimes.

I will install a new instance in these days and start digging admin UI around.

I'll keep you updated!

Collapse
Posted by Antonio Pisano on
I've started bumping my head on the task, starting from /admin UI. Right now the following pages look ok:

- /admin/applications/
- /admin/configure
- /admin/permissions
- /members/
- /shared/parameters
- /admin/subsite-add

Headaches start at /admin/site-map/ though... in there you may find that forms are being nested into a table in an illegal way from HTML 4.01 Strict perspective.

It is really not obvious how to get out of that without redesigning the page. We could use a <div> instead than a <table> layout for the list for example.

Another strategy which COULD be easier is to remove all nested <form> tags and wrap the entire list into a unique form. It is not easy though, because there are 3 different actions that can be taken based on which button has been pressed. Buttons that trigger different actions for a single form require either a little javascript or HTML5.

This is just for the sake of keeping current UI unaltered for the user, of course. We could just have forms handled in their respective page.

Which is best for you?

Collapse
Posted by Antonio Pisano on
(Put markup in the message and got messed up, sorry)

I've started bumping my head on the task, starting from /admin UI. Right now the following pages look ok:

- /admin/applications/
- /admin/configure
- /admin/permissions
- /members/
- /shared/parameters
- /admin/subsite-add

Headaches start at /admin/site-map/ though... in there you may find that forms are being nested into a table in an illegal way from HTML 4.01 Strict perspective.

It is really not obvious how to get out of that without redesigning the page. We could use a div instead than a table layout for the list for example.

Another strategy which COULD be easier is to remove all nested form tags and wrap the entire list into a unique form. It is not easy though, because there are 3 different actions that can be taken based on which button has been pressed. Buttons that trigger different actions for a single form require either a little javascript or HTML5.

This is just for the sake of keeping current UI unaltered for the user, of course. We could just have forms handled in their respective page.

Which is best for you?

Collapse
Posted by Benjamin Brink on
Antonio, you write:

"Buttons that trigger different actions for a single form require either a little javascript or HTML5."

Is this because of how ad_form works?

Using html (or the q-forms package ), multiple submit buttons can be embedded in the same form.

Submitted forms can then take action according to the name and value returned with the submit button; other submit button's name and value attributes are ignored.

Collapse
Posted by Benjamin Brink on
To be clear, I'm referring to the INPUT tag where type is "submit", not the BUTTON tag.
Collapse
Posted by Gustaf Neumann on
Antonio,

the page was easy to fix by wrapping the input fields into a div (see [1]). Validation-wise there is no need to revamp this page.

-g

[1] http://cvs.openacs.org/changelog/OpenACS?cs=MAIN%3Agustafn%3A20150609071032

Collapse
Posted by Antonio Pisano on
Much simpler! Thanks, didn't think about that. I will keep it in mind for next similar cases!
Collapse
Posted by Gustaf Neumann on
actually, HTML attribute quoting seems to have changed over the years. as i remember, during the accessibility reform of dotlrn, quotes around HTML attributes were required. Nowadays, HTML5 does not require quoting attributes, HTML4 seems to recommend quotes, only XHTML requires it.

anyhow, i've added quotes to the HTML attributes of the admin pages of acs-subsite were i found it missing, since for variable attribute values omitting quotes is always dangerous.

-g

[1] http://cvs.openacs.org/changelog/OpenACS?cs=MAIN%3Agustafn%3A20150609075752
[2] http://cvs.openacs.org/changelog/OpenACS?cs=MAIN%3Agustafn%3A20150609072638
[3] http://cvs.openacs.org/changelog/OpenACS?cs=MAIN%3Agustafn%3A20150609084041

Collapse
Posted by Gustaf Neumann on
Actually, saving the content of the pages via browser has apparently some disadvantages, since - at least for firefox - the browser rewrites some parts - and fixes some invalid markup.

Therefore, i've implemented some support to capture the server output optionally [1] and fixed with that tool a few more validation problems of the admin pages. The problem with the site-map page is actually a nasty one, since the old code implemented a form starting in one table cell and ending in another one, which is invalid HTML. The fixed version [2] does not look exactly as before, but i think it is still sufficiently easy to understand. Other options would be to start the form before a <tr>, but i have no idea to achieve this with the list template.

[1] http://cvs.openacs.org/changelog/OpenACS?cs=MAIN%3Agustafn%3A20150613202408
[2] http://cvs.openacs.org/changelog/OpenACS?cs=MAIN%3Agustafn%3A20150613200556

Collapse
Posted by Antonio Pisano on
This is going to be hard work, and currently my time has shrinked...

Anyway, I have looked at your technique to log ns_return and maybe I will come up with something to reduce some passages.

Collapse
Posted by Gustaf Neumann on
yes, it is some work.

I've set up a wiki page [1] for the pages I've fixed (and checked) so far to avoid duplication of work. please update this page as well (also, everybody else is invited to help). I found as well several public pages which needed some fixing.

-g

[1] https://openacs.org/xowiki/oacs-5-9-html-validity