Re: [xsl] Tree Comparing Algorithm

Subject: Re: [xsl] Tree Comparing Algorithm
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 4 Feb 2020 10:49:21 -0000
Am 04.02.2020 um 01:49 schrieb Michael Kay mike@xxxxxxxxxxxx:
I haven't studied it in close detail, but I strongly suspect that the
initial processing of the input files is streamed, but at some stage
in the processing pipeline everything ends up in memory.

Martin's solution uses arrays, and array processing in Saxon is
generally not pipelined in the way that sequence processing (normally)
is. For example, operations such as filtering and mapping on sequences
are generally pipelined (whether or not the input is streamed), while
the equivalent operations on arrays will materialise the array in memory.


There's no intrinsic reason for not pipelining operations on arrays, other than the lesson I learnt many years ago as an undergraduate computer science student: when you're doing optimisation, focus your efforts on the constructs that are encountered most frequently. Today everyone is using sequences, and not many people are using arrays.


I certainly prefer using sequences of nodes to arrays of nodes but with the restrictions on streamable stylesheet functions and processing of streamed nodes I manage to pass in

  mf:compare((saxon:stream(doc('file1.xml')/*),
saxon:stream(doc('file2.xml')/*)))


to a function taking a sequence of nodes <xsl:param name="pair" as="element()*"/> but even an an attempt to solely output (the) two items in the sequence by name, using positional predicates with

$pair[1]!node-name(), $pair[2]!node-name()

is rejected by the streamability analysis with

  XTSE3430: Function mf:compare is not streamable. There is more than
one consuming operand:

So that's why experimented with the array or also with a map/tuple of
the two streamed nodes from different documents, with the sequence of
two nodes I don't manage to compare them without breaking the
streamability rules of not allowing more than one consuming operand.

Current Thread