RE: [xsl] Ordered union of sequences

Subject: RE: [xsl] Ordered union of sequences
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 8 Apr 2010 17:01:28 +0100
> > BTW, the idea behind this is to create part of an XML 
> Schema from evaluating document instances.

In my DTD generator (untouched for many years, but still used, if only by
me)

http://saxon.sourceforge.net/dtdgen.html

this is what I do:

"If an element contains child elements but no significant character data,
then it is declared as having element content. If the same child elements
occur in every instance of the parent and in a consistent sequence, then
this sequence is reflected in the element declaration: where child elements
are repeated or trailing children (only) are omitted in some instances of
the parent element, this will result in a declaration that shows the child
element as being repeatable or optional or both. If no such consistency of
sequence can be detected, then a more general form of element declaration is
used in which all child elements may appear any number of times in any
order. 
If neither character data nor subordinate elements are found in an element,
it is assumed the element must always be empty."

My memory of the details is hazy, but the general approach is that it
doesn't try too hard to find the "perfect" answer (a grammar that matches
all the instances and only these instances) because that isn't actually what
the user wants: given your four examples of content, the chances are that
the "true" intended schema is "anyType".

I suspect, similarly, that given the instances

a b

a c

a b c

a c b

the content model it comes up with will be (a|b|c)*

In practical use, I think the tool is more likely to come up with a schema
that is too tight rather than one that is too loose; but there is always a
need for manual adjustment.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay 

Current Thread