Forum OpenACS Development: Re: XML to XoTCL Object

9: Re: XML to XoTCL Object (response to 8)

Posted by Gustaf Neumann on 12/31/07 08:21 PM

Tom

Then the document can be validated against an XML Schema and/or data extracted without using xpath/xquery

Do you say that you have implemented an XML Schema validator? That sounds like a useful contribution! Can you explain more?

.... And serialization to XML is not handled by tDOM ...

What is wrong about using appendFromList?

With tDOM, you can't introspect the document

What can be done with the namespace tree you are producing that can't be done via DOM (with a well known interface)?

For getting the child elements, [$root childNodes] seems simpler to me.

10: Re: XML to XoTCL Object (response to 9)

Posted by Tom Jackson on 12/31/07 10:00 PM

Do you say that you have implemented an XML Schema validator? That sounds like a useful contribution! Can you explain more?

I'm not sure of the terminology. This code is all part of a type system which is relatively independent from the tWSDL/TWiST code. The type code can create types using the methods of restriction outlined in XML-Schema datatypes:

http://www.w3.org/TR/xmlschema-2/

and these can be combined into complexTypes, mostly limited to sequences:

http://www.w3.org/TR/xmlschema-1/

Once the types are defined, then instances can be created with the ::new proc (which supports full nesting to any depth).

The defined schema is part of the WSDL file, but could probably be pulled out as an independent document. Documents/instances can be validated using the ::validate proc. The validation does structural validation then type validation. Validation stops on the first error and backs out, marking a path (using internal variables). Right now the only use for this validation information is to create a SOAP fault indicating the reason for failure, but maybe you could go in and correct something and revalidate.

The best way to understand all of this is to just look at the auto-generated code for handling all of this:

http://junom.com/ws/mywebservice/

The entire auto-generated code is based upon the config:

http://junom.com/ws/mywebservice/index.txt

The auto-generated WSDL/schema:

http://junom.com/ws/mywebservice/?WSDL

The derivation of decimal types is complex to code, but very useful. The example operation is at:

http://junom.com/ws/mywebservice/?op=TestDecimalValueOperation&mode=display

The generated type validation procedure is here:

http://junom.com/ws/mywebservice/?ns=::wsdb::types::mywebservice::TestDecimal

In general, you can browse under either simpleTypes or complexTypes to see the various code for either type creation or type validation.

What is wrong about using appendFromList?

There is nothing wrong with it, however it would mean several inconveniences. The first is it would require tDOM. Second, during building the document, additional information is added which aids validation (see above examples). Also, invalid documents are marked with additional metadata which isn't part of the XML document. Another problem is that you can't easily introspect the document to figure out why you can't get something to work. I use a namespace browser to inspect documents and all code generated by the service description. Tracking down data errors in tDOM would be difficult (for me at least). Also, some documents are constructed in parts. This means they need to be created before their parents, even before the document. This is easy, since the creating proc only needs to return the location (namespace) of the new element. This name can be used as a reference to add the element.

So documents/elements are sometimes looked over more than once. One additional example is the problem of serialization. I'm using only document/literal, so there is minimum need to worry about it, but rpc/(encoded/literal) are only slightly different from document/literal. In many cases, only a type attribute needs to be added to elements. Since the SOAP layer is above the document creation layer, this decision cannot be made until after the document is filled with data. Some future RPC layer could use the schema and go back and add the type information, possibly adding prefixes, etc.

There are example Tcl reps of XML documents under the ::xml::instance namespace:

http://junom.com/ws/mywebservice/?ns=::xml::instance

Finally, there is nothing wrong with tDOM, I would never suggest that, but you can't get any better introspection than simply browsing the document. This type of introspection is aimed at finding errors and understanding what is going on. Point and click is easier on my brain.

12: Re: XML to XoTCL Object (response to 10)

Posted by Gustaf Neumann on 01/01/08 11:53 PM

me:
Do you say that you have implemented an XML Schema validator?
Tom:
I'm not sure of the terminology.

An XML Schema validator is a program that accepts an XML Schema (as defined by w3c) and a schema instance (an XML document), analyzes both and determines if the XML document is a valid instance of the schema (well-formed by XML rules and obeying the structural and datatypes constraints by the XML Schema).

If i understand correctly, your implementation parses the XML-schema (from the WDSL document) into the namespace-structure and auto-generates the type-checkers for the primitive and derived XML datatypes (terminology of http://www.w3.org/TR/xmlschema-2/).

Does your implementation cover the full set of built-in datatypes of XML Schema, the full set of built-in derived datatypes, and the full set of simple type definitions (section 4)? I understand, that you have not implementd complex type checking (yet).

This is certainly an interesting addition to a tcl based xml processing environment, complementing tdom.

Concerning introspection: The term introspection is used in computer science for the ability of a program to query at runtime its own structure and behavior (and optionally to modify it; using terms "read introspection" and "write introspection"). Another term for this is "reflection". This ability is particular important for dynamic languages, where the structures (e.g. object-class relationships, class-class relationships, adding methods or variables dynamically, ...) might change during runtime. By this meaning of the term, tdom has read and write introspection; i would call you usage of the term rather xml structure browsing.

13: Re: XML to XoTCL Object (response to 12)

Posted by Tom Jackson on 01/02/08 08:41 AM

Gustaf,

From the user's perspective, the xml component of tWSDL does document validation based upon a defined XML Schema. However, there is more going on, and there is not a single step which qualifies quite by this definition.

First, types are defined via a Tcl API and the XML Schema is generated based upon the definitions. If this were not the case, then all the xsd types would have to be hand coded, or would they not even exist until you parsed an xsd which defined them? The link below shows how the xsd types are derived:

http://junom.com/gitweb/gitweb.perl?p=twsdl.git;a=blob;f=packages/wsdl/ns/ns-xsd.tcl

The page needs some cleanup, but basically it outlines how the internal API generate the type system. I'll include a few lines here:

# Create xsd schema
::wsdl::schema::new xsd "http://www.w3.org/2001/XMLSchema"

# anySimpleType
::wsdl::types::primitiveType::new xsd anySimpleType {return 1} 

# string
::wsdl::types::primitiveType::new xsd string {return 1} 

# dateTime
::wsdl::types::primitiveType::new xsd dateTime "return \[::wsdb::types::tcl::dateTime::toArray \$value\]" 

# duration
::wsdl::types::primitiveType::new xsd duration "return \[::wsdb::types::tcl::dateTime::durationToArray \$value\]" 

# boolean
::wsdl::types::simpleType::restrictByEnumeration xsd boolean xsd::string {0 1 true false}

# Decimal Type
::wsdl::types::simpleType::restrictDecimal xsd decimal xsd::string {pattern {\A(?:([\-+]?)([0-9]*)(?:([\.]?)|([\.])([0-9]+))){1}\Z}}

::wsdl::types::simpleType::restrictDecimal xsd integer tcl::integer {fractionDigits 0}
::wsdl::types::simpleType::restrictDecimal xsd int tcl::integer {fractionDigits 0} 
::wsdl::types::simpleType::restrictDecimal xsd nonPositiveInteger xsd::integer {maxInclusive 0}
::wsdl::types::simpleType::restrictDecimal xsd negativeInteger  xsd::integer {maxInclusive -1}
::wsdl::types::simpleType::restrictDecimal xsd short xsd::integer {minInclusive -32767 maxInclusive 32767}
::wsdl::types::simpleType::restrictDecimal xsd byte xsd::integer {minInclusive -127 maxInclusive 127}

Structural types are supported as sequences. You can specify minOccurs, maxOccurs, type, default value, nillable, and implicitly, the order of child elements. Children can be either simpleType content or another complexType. You can also have a child element with a local name which refers to a global type. The code for creating and validating the type is in the global type, but the local element provides the name and reference to the global type.

The structural details are checked first during validation. If all children are present in the correct number, validation steps through the child elements until eventually children with only simpleType content are validated.

If there is a validation error, the validation checker marks the 'nodes' on the way back out so a client can get a complete path to the error and a pretty good error message indicating what failed. In tWSDL, this is used to return a SOAP fault message (client fault). However, any application could easily access the same information. The error information is stored with the document rep, but it isn't part of any serialized version of the document (at least with XML).

11: Re: XML to XoTCL Object (response to 9)

Posted by Tom Jackson on 01/01/08 12:02 AM

The original documentation, written before the code, has a good overview of the motivations and design decisions.

http://junom.com/document/tWSDL/api-types.html

Also, the XML API (now in ::xml, not ::wsdl) is explained here along with discussion of tDOM usage. The examples are no longer completely accurate, but they are very close. Considering these docs were written almost two years ago, it is surprising how close the final code came to the plans.

http://junom.com/document/tWSDL/api-xml.html

The main difference is the addition of prefix information into the metadata along with the ability to add child elements by reference (instead of actual namespace children). Also, all of the XML elements are created through a single API. This frees future developers from needing to know the bookeeping details for element metadata, and any changes to the internal details will not affect any developer code.

The next development push will probably be to add support for attributes. That means the ability to define them either globally or locally, associate them with any global simpleType, and include them in document validation. Attribute validation and defaulting should be much easier than what is required for element validation. Internally I intend to use attributes to indicate information about the element content, for instance if/how the content is encoded.