Subject: Re: [xsl] Tree Comparing Algorithm|
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 4 Feb 2020 10:49:21 -0000
I haven't studied it in close detail, but I strongly suspect that the initial processing of the input files is streamed, but at some stage in the processing pipeline everything ends up in memory.
Martin's solution uses arrays, and array processing in Saxon is generally not pipelined in the way that sequence processing (normally) is. For example, operations such as filtering and mapping on sequences are generally pipelined (whether or not the input is streamed), while the equivalent operations on arrays will materialise the array in memory.
There's no intrinsic reason for not pipelining operations on arrays, other than the lesson I learnt many years ago as an undergraduate computer science student: when you're doing optimisation, focus your efforts on the constructs that are encountered most frequently. Today everyone is using sequences, and not many people are using arrays.
I certainly prefer using sequences of nodes to arrays of nodes but with the restrictions on streamable stylesheet functions and processing of streamed nodes I manage to pass in
to a function taking a sequence of nodes <xsl:param name="pair" as="element()*"/> but even an an attempt to solely output (the) two items in the sequence by name, using positional predicates with
XTSE3430: Function mf:compare is not streamable. There is more than one consuming operand:
So that's why experimented with the array or also with a map/tuple of the two streamed nodes from different documents, with the sequence of two nodes I don't manage to compare them without breaking the streamability rules of not allowing more than one consuming operand.