Re: [xsl] Tree Comparing Algorithm

Subject: Re: [xsl] Tree Comparing Algorithm
From: "Vasu Chakkera vasucv@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 3 Feb 2020 20:10:04 -0000
Thanks both. Martin's solution sort of worked, but it only gave me 21
children, but I had around 21000 nodes in the xml. I am not sure to what
depth the comparison is happening.

On Mon, 3 Feb 2020 at 12:16, Michael Kay mike@xxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> The only facility in the XSLT 3.0 to allow streaming of two input files
> "in parallel" is xsl:merge, and as Martin points out, that's rather
> specialised and not really suited to your requirements.
> In Saxon, streaming is in most cases done in push mode (where the parser
> owns the control loop, and sends events to the XSLT processor). You can't
> have two parallel control loops except with multi-threading, so the
> opportunities for streaming multiple files are limited (with xsl:merge,
> Saxon indeed uses multi-threading).
> At first sight, I don't see an XSLT-based answer to this one.
> Except, perhaps: you could do a streamed transformation of each input
> documents into an XML representation of an event stream, like
> <startElement name="folder" path="" hash=""/>
> <startElement name="folder" path="" hash=""/>
> <endElement name="folder"/>
> etc
> and then attempt to do an xsl:merge of the two event streams.
> Michael Kay
> Saxonica
> On 3 Feb 2020, at 13:47, Vasu Chakkera vasucv@xxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi All,
> I am planning to write a XML Tree comparing XSLT using streaming.
> The XML Trees look something like this
> <root path="" mhash =" ">
>   <folder path ="" mhash ="">
>     <folder path ="" mhash ="">
>        <leaf path ="" mhash ="">
>        </leaf>
>     </folder>
>   </folder>
> </root>
> There will be two such XML files to compare . These two XMLs are generated
> before and after moving a folder from source to destination. Source and
> destination could be two different OS.
> This is essentially the serialized Merkle Tree output of a folder
> structure. The idea is to run a Merkle Tree comparator that will pick the
> nodes that did not match. Rules are as follows.
>    1. If the root node in both the tree matches, then there is not
>    difference in the entire tree(because of how the Merkle tree is generated)
>    2. If root node hash does not match, we go to the child container and
>    compare the hash of the child container in both the XML files. ( the XML
>    folders structure will be identical with respect to the hash, but the
>    folder  path may be different because of the linux, windows path
>    conventions. Otherwise the folder structure is meant to be the same.)
>    3. If the hash of a folder from both the trees are same, the entire
>    tree under the folder that matches the hash is ignored.
>    4. if the hash of a folder from both the trees are not the same, then
>    the tree is further traversed and the step 3 is repeated.
>    5. The XSLT keeps writing out the nodes that do not match the hashes
>    in the source and target xml files
> So at the end of the processing, A comparator tree should be serialized,
> that has the nodes that have a non matching leaf node.
> Looking at the serialized tree, we can determine, which files got messed
> up while doing a transfer from Source to target.
> I am able to do this using non streaming xslt, but with streaming, since
> we need to stream two trees at a time and match compare the nodes,  i am
> not very sure how to proceed.
> I am able to do manipulations on one XML with streaming. I tried a few
> tricks, but did not get anywhere ( I am not very comfortable copying my
> code scribbling here)
> I need streaming because the XML files may be big.
> If someone has done something similar, or point me to an  intelligent way
> to do this, I will be thankful.
> Vasu
> XSL-List info and archive <>
> EasyUnsubscribe <> (by
> email)
> XSL-List info and archive <>
> EasyUnsubscribe <> (by
> email <>)

Vasu Chakkera
NodeLogic Limited

Current Thread