Forum OpenACS Development: Moving to XHTML

1: Moving to XHTML

Posted by Tom Jackson on 09/13/07 03:52 AM

I was browsing the chatroom logs today. There was some discussion about moving to XHTML from whatever is now delivered. I don't really have a stake in this transition, nothing I do would be affected, but I think it is important to point out the extreme difficulty in moving to XHTML strict.

There are a number of things to think about, but the basic issues are discussed here:

http://www.hixie.ch/advocacy/xhtml

It is possible that everything talked about in the above link can be handled in the OpenACS codebase. However, _everything_ has to be perfect. That means content added by your users. You can't just insert HTML into XHTML strict and expect it to work.

2: Re: Moving to XHTML (response to 1)

Posted by Dave Bauer on 09/13/07 04:06 AM

Thanks Tom, very interesting reading. According to that IE6 can't handle XHTML properly, that seems like a big problem.

I haven't done any tests, but you can be sure before that happens the tests will be done.

3: Re: Moving to XHTML (response to 2)

Posted by Tom Jackson on 09/13/07 05:37 AM

I admit in advance that I am not an expert in this, but for those responsible, I would suggest a very detailed comparison between the HTML and XHTML. Most important is what is the purpose of moving to XHTML. Moving to this strict format must provide benefits to offset the pain involved in conversion. Essentially any proc/page which returns HTML must be reviewed ... in the context of any valid input. Stylesheets, javascript, all of these are not the same under XHTML.

The easiest suggestion is to look for a dynamic site which actually does what you think you want to do. There may be simple solutions, but best to build upon the work of others. I hope these example sites exist, but don't be the first unless you have an expert.

4: Re: Moving to XHTML (response to 1)

Posted by Gustaf Neumann on 09/13/07 09:12 AM

I do agree, that moving towards XHTML will not be an easy path, esp. when IE6 compatibility is desired. Note, that the (great) analysis by Ian Hickson (of Opera, now Google) concerns mostly delivering XHTML with the media-type text/html, but there are alternatives http://www.w3.org/TR/xhtml-media-types/ which do not always work (see e.g. the W3C recommendation for IE http://www.w3.org/MarkUp/2004/xhtml-faq.html#ie and the arguments in http://lists.w3.org/Archives/Public/www-html-editor/2004JulSep/0027.html)

However, on the longer range, xhtml is the way to go; newer html editors provide support for xhtml (e.g. tinymce, wymeditor), wide-used systems such as wordpress support xhtml today (via a plugin, essentially checking, what the web agent accepts and setting the mime type accordingly). We should be able to at least as good.

As a side-note: by using tdom as an html generator (as in the xotcl templating), it is actually possible to output optionally html or xml. However, this would require a complete rewrite of acs-templating (also, not all of xowiki uses the tdom generator). As wordpress shows, dynamic switching does not seen necessary.

5: Re: Moving to XHTML (response to 4)

Posted by Tom Jackson on 09/13/07 04:45 PM

I'm not sure what the templating system has to do with XHTML? Template engines should output whatever is asked of them. One failure of acs-templating is adding whitespace in place of removed tags, but otherwise can't it output XHTML right now? Maybe you mean that templates need to look like valid XHTML?

How do html editors fit into OpenACS development? Will future work require their use?

The thing I don't understand about XHTML is how it fits into the needs of a site which allows users to provide dynamic content. Users would never be able to put in anything which wasn't perfect. That means a very smart input filter, and I can't see how the filter would be able to distinguish errors from intentional user input.

But maybe there is a good reason for using XHTML. What is it?

Are there any pages on OpenACS which could be delivered as application/xhtml+xml? If not, are there any pages which are close? Has anyone tried to convert them to see what they look like?

My guess is that you would have to rewrite every page and proc which doesn't know about XHTML, and you would never be able to have a template that had any HTML tags in it, or every page would need to pass through a smart output filter with the same problems as the input filter. Then you have to deal with javascript and css differences, not just quoting the code, but the scripts themselves would need change.

I still think the fastest way to explore this is to find another site/toolkit which takes dynamic input and see how they handle the issues.

6: Re: Moving to XHTML (response to 5)

Posted by Don Baccus on 09/13/07 08:15 PM

Some systems use a different tagging scheme for user input, like [quote] rather than blockquote, and them transform them to the proper HTML or XHTML.

There's no plan to require HTML WYSIWIG editors to be used, but we already support xingha out of the box. Sites that want to make sure pages are correct might want to enforce the use of a WYSIWIG editor.

Browsers are still going to render malformed XHTML as well as they can, just as they do with malformed HTML. Smaller devices like phones might not, but then again by the time we get done with our transition, phones won't be "small devices" in the capacity sense.

As to why to go this direction ... the whole world's going this direction. HTML 4.01 is the last HTML standard W3C will put out. Everything else will be XHTML.

Here's a page written in XHTML strict:

http://w3c.org

Does IE6 render it incorrectly????

7: Re: Moving to XHTML (response to 6)

Posted by Tom Jackson on 09/13/07 11:36 PM

Okay, so you are not going to be serving it as application/xhtml+xml, but as plain ol' text/html.

But I did notice that the static home page at w3.org is treated by Firefox as application/xhtml+xml even though a meta tag and the server headers indicate text/html.

I guess there isn't going to be an option of which to use in OpenACS, the home page here is already not HTML 4.01 transitional, using /> to close empty tags.

9: Re: Moving to XHTML (response to 7)

Posted by Gustaf Neumann on 09/14/07 01:54 PM

it looks to me as for http://www.w3.org/ the apache at w3c evaluates the "accept" request header field from the user agent; if it contains application/xhtml+xml, it serves the page (xhtml 1.0) with this type (no meta flag with http-equiv). If i open this page with safari, i get the meta tag http-equiv for the content type with text/html. We should be able to do similar depending on the capabilities of the browser.

if we simply want to stick with simply rendering traditional web-pages, there is no big need for moving towards XHTML (although it will make styling simpler). But look at the developments like microformats http://microformats.org/, or GRDDL http://www.w3.org/TR/2007/REC-grddl-20070911/, check out the use cases http://www.w3.org/TR/grddl-scenarios/ that show how to extract semantic information (RDF) from xhtml web pages. This opens many new perspectives, especially for systems built around a rich datamodel such as openacs.

-gustaf
PS: by "templating" i was refering in my earlier posting to the automatically generated html.

10: Re: Moving to XHTML (response to 9)

Posted by Tom Jackson on 09/14/07 11:37 PM

My version of Mozilla Firefox gets the w3 home page with a meta tag 'http-equiv="Content-Type" content="text/html; charset=utf-8"'. It is hard to see, smashed up against the head tag. However, I think you must be correct: the page it is being sent as application/xhtml+xml, when I use wget, it is text/html. Moving the identical file to my server and sending it, Firefox detects it as text/html, the only difference must be the server headers.

So how to solve the problem of being able to serve both. Simply making an additional template for each page wouldn't work. Some markup is produced in the tcl pages, in procs, and some is in the data. Somehow all these sources which make up the page which gets sent to the user have to be in sync.

My own method is to try to separate data, code and templating, roughly model-view-controller, but markup is difficult to handle generally. One idea is to have a data model which resembles RDF and to be able to apply templating to tiny chunks of code to create markup. As long as the two remain separate and allow for switching out the template, many different opportunities for reuse will open up. The RDF type model would allow the possibility of browsing the elements which were used to create a web page. (A simple html browser, or something using more complex, but either could operate on such a page just by using a different template. That is, the editor/browser would use the same technology on an expanded scale.) The RDF type linking between objects provides the semantic hints needed to pull this off.

13: Re: Moving to XHTML (response to 6)

Posted by Tom Jackson on 09/16/07 04:20 AM

As to why to go this direction ... the whole world's going this direction. HTML 4.01 is the last HTML standard W3C will put out. Everything else will be XHTML.

Until yesterday I never heard of it, but there is an upcoming HTML 5.0. http://www.w3.org/html/wg/html5/

Relationship to HTML 4.01, XHTML 1.1, DOM2 HTML
This specification represents a new version of HTML4 and XHTML1, along with a new version of the associated DOM2 HTML API. Migration from HTML4 or XHTML1 to the format and APIs described in this specification should in most cases be straightforward, as care has been taken to ensure that backwards-compatibility is retained.
This specification will eventually supplant Web Forms 2.0 as well.
Relationship to XHTML2
XHTML2 defines a new HTML vocabulary with better features for hyperlinks, multimedia content, annotating document edits, rich metadata, declarative interactive forms, and describing the semantics of human literary works such as poems and scientific papers.
However, it lacks elements to express the semantics of many of the non-document types of content often seen on the Web. For instance, forum sites, auction sites, search engines, online shops, and the like, do not fit the document metaphor well, and are not covered by XHTML2.
This specification aims to extend HTML so that it is also suitable in these contexts.
XHTML2 and this specification use different namespaces and therefore can both be implemented in the same XML processor.

8: Re: Moving to XHTML (response to 1)

Posted by Torben Brosten on 09/14/07 01:14 AM

I find these wikipedia references useful summaries on the topic:

http://en.wikipedia.org/wiki/XHTML

http://en.wikipedia.org/wiki/Comparison_of_layout_engines_(XHTML)

Since OpenACS already has a few template options (that are works in progress), why not add a default-master-html and default-master-xml etc. and an acs-subsite parameter that flags standard XHTML versus HTML and developers reference it with procedures that generate code/content? If not available as a parameter, then maybe as an acs-templating variable via property tags?

I'm sure that a few filters could help user-input validate. I'm working on one that translates html tables into lists.. and should work even when some of the table tags are missing (similar to how rendering engines can "forgive" some ommitted tags).

Anyway, with the parallel templates, the system could handle both conditions.

11: Re: Moving to XHTML (response to 1)

Posted by Torben Brosten on 09/15/07 10:05 AM

Are there other (practical) ways to optionally output html or xml (including mime-types) besides completely rewriting acs-templating (or depending on xotcl-core)?

Briefly looking at the source code of these procs:

template::get_mime_type

template::register_mime_type

adp_parse_ad_conn_file

and these pages:

openacs-4/www/blank-master.adp
openacs-4/www/blank-master.tcl

It seems possible with a few line changes and having a blank-master for xml and another for html. If some procedure gets complicated dealing with xml versus html conditions, just create a proc for each of the 2 cases.

What part of the templating system would not work with separate xml and html blank-master templates (and modifications to above procs)?

12: Re: Moving to XHTML (response to 11)

Posted by Gustaf Neumann on 09/15/07 12:53 PM

There are two different issues: (a) encoding the HTML document such the result is valid XHTML (1.0, 1.1, ..) and (b) telling the user agent (browser) that the media type (content-type) is valid XHTML.

The plugin of wordpress just alters the the media type according to the "accept" request header field (see rfc 2616, section 14.1) and sends in both cases the identical file. Same seems to happen with www.w3.org. Note, that this is only possible for XHTML 1.0, not for XHTML 1.1 or newer. For just altering the media-type i don't see the need of having different master templates ("ns_set update [ns_conn outputheaders] Content-Type" should be sufficient). For purpose (a) there is no other option than working though the code. XHTML 1.1 is a modularized version of XHTML 1.0 with only little differences, but XHMTL 2.0 won't support e.g. the <IMG> tag, this will be an even bigger step (XHTML 2.0 is not released).

For now XHTML 1.0 Transitional is already an ambitious target.