Elliotte Harold wrote:
W. Eliot Kimber wrote:
The issue is that in the transcluded result the IDs must be unique
(this is a basic requirement of XML).
This is not a basic requirement of XML. IDs in XML documents may in fact
be non-unique, and even non-name tokens (as recently came up in a
different context). A document containing such non-unique IDs would be
invalid but well-formed, and might still be usefully processed.
I did misspeak: if an attribute has a declared type of "ID" *and* the
document is intended to be DTD valid then the IDs must be unique. This
validation requirement is a non-optional feature of XML (in that if you
want DTD validity then ID uniqueness is not optional).
With XSD schema you can, of course, define key attributes that have
different scopes of uniqueness, although, unless I've missed something
in the XSD spec, the largest possible scope for key uniqueness is still
the physical XML document.
That is, my main point and the crux of the issue as far as element
addressing goes is, that in XML, regardless of which
constraint-specification standard you use, each XML document establishes
one or more identifier name spaces, which means that addressing elements
using some form of standard-defined (or standard-governed) mechanism
always involves first addressing the XML document and then the things in it.
[Note that using indirect addresses, such as those defined in the W3C
XIndirect note submitted by Innodata Isogen
(http://www.w3.org/TR/2003/NOTE-XIndirect-20030612/) you can impose a
global namespace over a group of documents at will. One distinguishing
feature of this approach is that the scope of the imposed namespace is
flexible--it's not an all-or-nothing approach. And by having multiple
stages of indirection you can, of course, combine distinct namespaces
into to new, larger namespaces, if needed.]
This fact is reflected in the XInclude href/xpointer attribute pair,
which I think is the best design choice given the overall constraints
imposed by XML syntax and practice. In particular, it clearly
distinguishes the storage object part of the address (a URI with no
fragment identifier) from the semantic object part of the address (e.g,
an XPointer that addressses an element, attribute, or sequence of data
characters) and avoids tricky syntactic interference between URI syntax
and the syntaxes of semantic addresses.
It's not clear, at least to my reading, whether or not the XInclude
allows or requires ID values to be rewritten such that all IDs in the
result are unique even if two input elements (from two different
source documents) have the same ID value.
The XInclude spec is clear. This is *not* allowed. ID values may not be
rewritten by a conforming XInclude processor, even if they conflict.
I think I was insufficiently precise in making my statement. The
challenge comes from the disjoint between how the XInclude specification
is defined and what one has to do in practice in an XSLT (or DOM or SAX)
processing environment.
That is, the XInclude specification is defined only in terms of infoset
modification and at that level you are correct--IDs are not rewritten in
the sense that the infoset information reflecting the original ID values
is not modified in the transcluded result. What *can be* modified are
the final reference pointer properties, which must reflect the original
reference target in its final, transcluded location.
Thus, while the syntactic IDs themselves are not changed, the final
effect of the references may be.
When discussing an XSLT implementation of XInclude processing the
problem is that XSLT is not operating on the infoset but on
XSLT-specific node trees in memory. In this tree there is no abstract
pointer property, only the original syntactic values. Therefore there is
no way to directly implement the XInclude reference fixup process except
by modifying the data values that are then interpreted as references.
Thus, in the process of performing transclusion one has no choice but to
change the ID and reference values in the transcluded result (which is a
new document tree) such that the reference correctness constraint is
preserved. This necessarily means that you either rewrite the values of
the original ID and reference attributes as they are copied from the
source to the transcluded result or you create new attributes that hold
the pointers and IDs as they need to be constructed in the transcluded
result.
The second approach would be closer in spirit to the XInclude spec
(because the original ID and reference values would be unchanged) but
would make it impossible to then process the transcluded result using
generic templates that were written against the original attribute
names. That is, if I have a pre-XInclude template that expects XRef
elements to use "refid" to point to "id" attributes, that template will
not work post-XInclude if I have used different attribute names for the
transcluded result references. This means that XInclude processing
cannot be an essentially transparent process for the core business logic
in this case.
What I do (or have done to date) is to copy the original ID and
reference values to new, transclusion-specific attributes, so that the
values are accessible to subsequent XSLT processors but rewrite the ID
and reference attributes. This allows me to issue messages that reflect
the original storage locations of references and targets but allows the
generic transformation business logic to be unchanged. This allows you
to retrofit XInclude processing into any existing XSLT process with a
minimal amount of effort (all you have to do it modify the root template
and implement any required ID and reference rewriting needed for the
document types involved).
Note too that in my processing the transcluded result is purely an
in-memory construct.
So I contend that at best my XSLT process is conforming, to the degree
that the XInclude spec can have an opinion about the conformance of
processing that does not involve direct (or effective) infoset
processing; or at worst a necessary concession to practicality in order
to have a working system that does not require globally-unique element
IDs but that is still consistent with the spirit and intent of the
XInclude recommendation.
Note that as a potential non-conformance there is very little risk
because the processing effect is, as far as I can tell, consistent with
the intent of the XInclude specification, which is what really counts
with a processing standard like XInclude.
With data standards, like XML, correctness is a binary condition. With
processing standards, correctness is necessarily more fuzzy.
Note that my personal intent is to conform as closely as I can to the
XInclude specification. I think it's a very good specification and very
badly needed. But it is, by itself, insufficient to meet the
requirements of the types of business processes I support, so I have no
choice but to either diverge from it in some places or extend it
unilaterally. But I try to do so in the most controlled and principled
way that I can.
If time and energy allow it would be nice to either extend the XInclude
spec to include support for my requirements (for example, formalize the
ability to specialize from xi:include). But there are lots of things
that need doing and only so many people to do them, and it's much easier
to just do what needs to be done for now and let standardization come
when it's really needed.
Cheers,
Eliot
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122
ekimber@xxxxxxxxxxxxxxxxxxx
www.innodata-isogen.com