Wasiq Shaikh wrote:
Im using XPath1.0/XalanJ 2.7 and yes I can use the node-set function.
I just realized you don't need the node-set function at all.
The ordering of elements does not matter but the concept of the merge
should be consistent, meaning all "like" elements are joined.
Yes, nodes must be on the same level to be comparable (mergeable?).
Nodes on any other level with the same likeness should not be merged
since they are part of a different node generation. For example:
<X>
<Z>
<Z>...
The parent and child <Z> elements should not be merged.
I'm afraid I still don't really understand what you are after. But from
your original input/output example, it seems that you want a node that
has the same names for ancestor-or-self as another node to be merged. I.e.:
<merge-a>
<no-merge-z />
<merge-b />
<no-merge-y />
</merge-a>
<merge-a>
<merge-b />
<merge-c>
<no-merge-x />
</merge-c>
</merge-a>
<merge-a>
<merge-c>
<no-merge-w />
</merge-c>
<merge-b />
<merge-c />
</merge-a>
in the above, the nodes a, b and c will be merged, because their paths
are the same. The nodes x, y and z are not merged (i.e., they stay as
they are) because they have distinct paths (one could argue that these
are also merged, from a set of 1 node to a set of 1 node).
So, in other words, any XPath X/Y/Z should return in one resulting node.
I leave the details for you, I assume you want to address some more
rules for any given node, like content or other properties to decide
whether a node is distinct or not (see below for what I understood to be
correctly merged with the above input)
The algorithm you mention is what I was thinking about doing. I know
it's quite simple, however, the amount of work the processor needs to
do in comparing each and every similar node is expensive. Joining two
nodes is fine, but what happens if you have tens, hundreds, or
thousands of similar nodes to merge? Then each child of those many
nodes needs to be compared and merged as well, and so on and so forth...
Forget about my algorithm, it was based on a not-so-good understanding
of your specifications.
I know it can be done in XSL, but can it handle such a process? Or is
this the work for procedural programming like Java?
Quite easy in XSLT 1.0, very easy in XSLT 2.0. Of course, you can always
attempt such a task in another language, but be aware, you probably have
to tree-walk everything yourself then.
Ok, here's the trick. You may have heard or read about dedupping
(there's been a nice discussion last year on what term should be used)
and I think that your problem is essentially nothing more than dedupping
based on a certain set of rules that define the uniqueness of a node.
The tricky bit is that "duplicate nodes" are not always on the same
level, which makes the process slightly harder.
I usually don't attempt XSLT 1.0 anymore when it comes to keys and the
like, in XSLT 2.0 the solution would be sooooo much easier to implement
and understand (if you can persuade your team to upgrade it will save
you some headaches in the future). Anyway, here it is, I call it
"Dedupping based on the node's XPath":
<xsl:key match="*" name="ancestors"
use="concat(
name(),
name(ancestor-or-self::*[2]),
name(ancestor-or-self::*[3]),
name(ancestor-or-self::*[4]),
name(ancestor-or-self::*[5]))" />
<xsl:template match="*">
<xsl:variable name="ancestors"
select="concat(
name(),
name(ancestor-or-self::*[2]),
name(ancestor-or-self::*[3]),
name(ancestor-or-self::*[4]),
name(ancestor-or-self::*[5]))" />
<xsl:if test="generate-id(key('ancestors', $ancestors)[1]) =
generate-id(current())">
<xsl:copy>
<xsl:apply-templates select="key('ancestors',
$ancestors)/*" />
</xsl:copy>
</xsl:if>
</xsl:template>
As you can see, it really isn't that hard (only a bit annoying,
especially by the duplicated logic). If you know something about
node-identity and how you can find two nodes that are identical inside
an XML document using XSLT 1.0, the above should read quite easy. Apart
from the "normal" dedupping code (inside the xsl:if), the core of the
piece is of course the key and the $ancestors, which are used to find
all nodes that have the same XPath.
The result of applying the above code to the above input document is as
follows (note that the order of input is preserved automatically):
<merge-a>
<no-merge-z/>
<merge-b/>
<no-merge-y/>
<merge-c>
<no-merge-x/>
<no-merge-w/>
</merge-c>
</merge-a>
And now, to impress your bosses and ask them for an upgrade, here's the
same code in XSLT 2.0 (and done in a tenth of the time), note the
function that makes it completely generic instead of static (xslt 1.0
can be made generic too, but requires a lot more effort):
<xsl:key match="*" name="ancestors" use="s:ancestor(.)" />
<xsl:template match="s:*[key('ancestors', s:ancestor(.))[1] is
current()]">
<xsl:copy>
<xsl:apply-templates select="key('ancestors',
s:ancestor(.))/*" />
</xsl:copy>
</xsl:template>
<xsl:function name="s:ancestor">
<xsl:param name="node" />
<xsl:value-of select="for $i in $node/ancestor-or-self::* return
name($i)" />
</xsl:function>
<xsl:template match="s:*" />
And yes, it can be done without a key (both in XSLT 1.0 and 2.0) but
with large documents that will require quite some reverse lookup that
will cost a lot of processor cycles. Anyway, I hope you enjoy the
solution (and even more if it is of some use for you).
Cheers,
-- Abel Braaksma