Re: [xsl] XInclude as an XSLT transformation?

Subject: Re: [xsl] XInclude as an XSLT transformation?
From: "W. Eliot Kimber" <ekimber@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 03 Jan 2005 10:28:32 -0600
Elliotte Harold wrote:

W. Eliot Kimber wrote:

The issue is that in the transcluded result the IDs must be unique (this is a basic requirement of XML).


This is not a basic requirement of XML. IDs in XML documents may in fact be non-unique, and even non-name tokens (as recently came up in a different context). A document containing such non-unique IDs would be invalid but well-formed, and might still be usefully processed.

I did misspeak: if an attribute has a declared type of "ID" *and* the document is intended to be DTD valid then the IDs must be unique. This validation requirement is a non-optional feature of XML (in that if you want DTD validity then ID uniqueness is not optional).


With XSD schema you can, of course, define key attributes that have different scopes of uniqueness, although, unless I've missed something in the XSD spec, the largest possible scope for key uniqueness is still the physical XML document.

That is, my main point and the crux of the issue as far as element addressing goes is, that in XML, regardless of which constraint-specification standard you use, each XML document establishes one or more identifier name spaces, which means that addressing elements using some form of standard-defined (or standard-governed) mechanism always involves first addressing the XML document and then the things in it.

[Note that using indirect addresses, such as those defined in the W3C XIndirect note submitted by Innodata Isogen (http://www.w3.org/TR/2003/NOTE-XIndirect-20030612/) you can impose a global namespace over a group of documents at will. One distinguishing feature of this approach is that the scope of the imposed namespace is flexible--it's not an all-or-nothing approach. And by having multiple stages of indirection you can, of course, combine distinct namespaces into to new, larger namespaces, if needed.]

This fact is reflected in the XInclude href/xpointer attribute pair, which I think is the best design choice given the overall constraints imposed by XML syntax and practice. In particular, it clearly distinguishes the storage object part of the address (a URI with no fragment identifier) from the semantic object part of the address (e.g, an XPointer that addressses an element, attribute, or sequence of data characters) and avoids tricky syntactic interference between URI syntax and the syntaxes of semantic addresses.

It's not clear, at least to my reading, whether or not the XInclude allows or requires ID values to be rewritten such that all IDs in the result are unique even if two input elements (from two different source documents) have the same ID value.


The XInclude spec is clear. This is *not* allowed. ID values may not be rewritten by a conforming XInclude processor, even if they conflict.

I think I was insufficiently precise in making my statement. The challenge comes from the disjoint between how the XInclude specification is defined and what one has to do in practice in an XSLT (or DOM or SAX) processing environment.


That is, the XInclude specification is defined only in terms of infoset modification and at that level you are correct--IDs are not rewritten in the sense that the infoset information reflecting the original ID values is not modified in the transcluded result. What *can be* modified are the final reference pointer properties, which must reflect the original reference target in its final, transcluded location.

Thus, while the syntactic IDs themselves are not changed, the final effect of the references may be.

When discussing an XSLT implementation of XInclude processing the problem is that XSLT is not operating on the infoset but on XSLT-specific node trees in memory. In this tree there is no abstract pointer property, only the original syntactic values. Therefore there is no way to directly implement the XInclude reference fixup process except by modifying the data values that are then interpreted as references.

Thus, in the process of performing transclusion one has no choice but to change the ID and reference values in the transcluded result (which is a new document tree) such that the reference correctness constraint is preserved. This necessarily means that you either rewrite the values of the original ID and reference attributes as they are copied from the source to the transcluded result or you create new attributes that hold the pointers and IDs as they need to be constructed in the transcluded result.

The second approach would be closer in spirit to the XInclude spec (because the original ID and reference values would be unchanged) but would make it impossible to then process the transcluded result using generic templates that were written against the original attribute names. That is, if I have a pre-XInclude template that expects XRef elements to use "refid" to point to "id" attributes, that template will not work post-XInclude if I have used different attribute names for the transcluded result references. This means that XInclude processing cannot be an essentially transparent process for the core business logic in this case.

What I do (or have done to date) is to copy the original ID and reference values to new, transclusion-specific attributes, so that the values are accessible to subsequent XSLT processors but rewrite the ID and reference attributes. This allows me to issue messages that reflect the original storage locations of references and targets but allows the generic transformation business logic to be unchanged. This allows you to retrofit XInclude processing into any existing XSLT process with a minimal amount of effort (all you have to do it modify the root template and implement any required ID and reference rewriting needed for the document types involved).

Note too that in my processing the transcluded result is purely an in-memory construct.

So I contend that at best my XSLT process is conforming, to the degree that the XInclude spec can have an opinion about the conformance of processing that does not involve direct (or effective) infoset processing; or at worst a necessary concession to practicality in order to have a working system that does not require globally-unique element IDs but that is still consistent with the spirit and intent of the XInclude recommendation.

Note that as a potential non-conformance there is very little risk because the processing effect is, as far as I can tell, consistent with the intent of the XInclude specification, which is what really counts with a processing standard like XInclude.

With data standards, like XML, correctness is a binary condition. With processing standards, correctness is necessarily more fuzzy.

Note that my personal intent is to conform as closely as I can to the XInclude specification. I think it's a very good specification and very badly needed. But it is, by itself, insufficient to meet the requirements of the types of business processes I support, so I have no choice but to either diverge from it in some places or extend it unilaterally. But I try to do so in the most controlled and principled way that I can.

If time and energy allow it would be nice to either extend the XInclude spec to include support for my requirements (for example, formalize the ability to specialize from xi:include). But there are lots of things that need doing and only so many people to do them, and it's much easier to just do what needs to be done for now and let standardization come when it's really needed.

Cheers,

Eliot
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122

ekimber@xxxxxxxxxxxxxxxxxxx
www.innodata-isogen.com

Current Thread