RE: [xsl] copy-of "canonicalization" behavior in Xalan (Java)

Subject: RE: [xsl] copy-of "canonicalization" behavior in Xalan (Java)
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Fri, 23 Jul 2004 08:45:06 +0100
> The copy-of element when processed by Xalan (Java) appears to 
> canonicalize the output, rather than output the source tree exactly.
> 
> For specific nodes in the source tree I would like to create 
> an identical copy in the result tree, including redundant namespace
> declarations.

A tree in the XPath data model does not contain namespace declarations, it
contains namespace nodes. When you parse XML, every element node in the
resulting tree will have one namespace node for each namespace that is in
scope. When a tree is serialized, namespace declarations are added to the
output where needed: serializers will generally avoid outputting redundant
declarations.

The system is indeed creating an identical copy of the tree. The information
that's being lost is being lost at the time the original tree is
constructed.

The XML Spy behavior is conformant too, because there's nothing in the spec
that prevents a processor retaining this extra information (it could also
remember whether the namespace URI was written in single or double quotes if
it chose to).

If you want to extract fragments of the result tree for subsequent
processing, this will work fine if you do the extraction using tools that
respect the XPath data model. If you try and do it using textual
cut-and-paste, it will fail. One of the drawbacks of XML Namespaces has
always been that textual cut-and-paste is no longer a viable approach.

Michael Kay


> 
> Assume a source document like:
> 
> <foo:root xmlns:foo="http://abc.org/foo#"; 
> xmlns:xyz="http://xyzinc.com/xyz#";>
> 	<foo:parent xmlns:foo="http://abc.org/foo#";>
> 		<foo:child xmlns:foo="http://abc.org/foo#";>more 
> text</foo:child>
> 		<xyz:child 
> xmlns:xyz="http://xyzinc.com/xyz#";>yet more text</xyz:child>
> 	</xyz:parent>
> </foo:root>
> 
> The namespace declarations on the parent and child nodes are 
> redundant (their namespace prefixes have been bound to a namespace on
> the root node).
> 
> When I use copy-of, such as in the simple template below, in 
> XML Spy using its built in XSLT processor the result tree is an exact
> and complete copy of the source tree, redundant namespace 
> declarations and all, as I would expect.
> 
> <xsl:template match ="/">
>   <xsl:copy-of select="(.)"/>
> </xsl:template>
> 
> (I have simplified the template in the extreme to make it clear.)
> 
> When I run the same template with the Xalan (Java) XSLT 
> processor, which uses a SAX parser, I get a "cleaned", 
> canonical form of the
> source tree as my result, with all redundant namespace 
> declarations removed.
> 
> This may appear to be a benefit, but I later manipulate parts 
> of the result tree (which is much more complex than the simple
> example) as separate XML fragments and at that point the 
> namespace declarations are in fact no longer redundant but critical.
> 
> I have not been able to find anything in the Xalan 
> documentation which suggests a way to avoid this 
> canonicalization - perhaps its a
> SAX issue? Is there a way to force Xalan to make an exact 
> copy of the source tree, warts and all?
> 
> 
> Thanks

Current Thread