Re: [xsl] XQuery/XPath 3.1: Node List to Node Set ("distinct nodes")

Subject: Re: [xsl] XQuery/XPath 3.1: Node List to Node Set ("distinct nodes")
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 29 Dec 2021 00:46:22 -0000
As for performance, I compared the execution times of the two solutions
(the index-of   vs    fold-left / intersect / if-then-else).

The Xml document was : "<t><a/><b/><c/></t>".
The $nodes sequence contained 45 nodes:
$nodes := ($xml/*/a, $xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b, $xml/*/a,
$xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b, $xml/*/a, $xml/*/c, $xml/*/b,
$xml/*/a, $xml/*/b, $xml/*/a, $xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b,
$xml/*/a, $xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b, $xml/*/a, $xml/*/c,
$xml/*/b, $xml/*/a, $xml/*/b,$xml/*/a, $xml/*/c, $xml/*/b, $xml/*/a,
$xml/*/b, $xml/*/a, $xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b, $xml/*/a,
$xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b )

Separately I timed only the time it takes for executing parse-xml() and
constructing the node sequence. All this was done with BaseX.

Results:
Parsing the Xml document and constructing the sequence:  0.10ms
Evaluating the "short" expression:  0.41ms
Evaluating the "long"  expression:  0.44ms

"short" vs. "long" with the parsing time subtracted:  0.31ms vs. 0.34ms

Thus we see that both expressions have approximately the same efficiency,
though in this concrete measurement the "short" was about 10% faster than
the "long"  (I suspect this difference is not statistically significant).

Cheers,
Dimitre


On Tue, Dec 28, 2021 at 4:23 PM Dimitre Novatchev dnovatchev@xxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>
>
> On Tue, Dec 28, 2021 at 4:10 PM Michael Kay mike@xxxxxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>>
>>
>> > On 28 Dec 2021, at 23:54, Dimitre Novatchev dnovatchev@xxxxxxxxx <
>> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> >
>> >
>> >    $nodes[index-of($nodes ! generate-id(.), generate-id(.))[1]]
>> >
>> > This seems a candidate for "the shortest solution" and it shouldn't be
>> inefficient, given a good optimizer:
>> >
>>
>> It probably also gets a prize for the first practical use case of a
>> filter expression where the predicate is numeric and has different values
>> for different nodes in the input sequence.
>>
>> It's going to be O(n*m) unless index-of() is optimized to use some kind
>> of index or hash lookup rather than a sequential search. That's assuming
>> that the expression $nodes ! generate-id(.) gets loop-lifted; if it isn't,
>> then it becomes O(n*n*m).
>>
>>
> Seems BaseX is good enough to do this. I increased the number of nodes in
> $nodes 3 times and there was no increase in the evaluation time.
>
>
>
>> Aesthetically, I find generate-id() ugly and it would be nice to avoid it.
>>
>
> Its name is ugly, yes. A shorter and more meaningful name, like id() or
> key() would be much better. Maybe we need a mechanism in XPath 4.0 to
> specify global aliases (like a using file... )
>
>
> Cheers,
> Dimitre
>
>
>>
>> Michael Kay
>> Saxonica
>>
>>
>>
>
> --
> Cheers,
> Dimitre Novatchev
> ---------------------------------------
> Truly great madness cannot be achieved without significant intelligence.
> ---------------------------------------
> To invent, you need a good imagination and a pile of junk
> -------------------------------------
> Never fight an inanimate object
> -------------------------------------
> To avoid situations in which you might make mistakes may be the
> biggest mistake of all
> ------------------------------------
> Quality means doing it right when no one is looking.
> -------------------------------------
> You've achieved success in your field when you don't know whether what
> you're doing is work or play
> -------------------------------------
> To achieve the impossible dream, try going to sleep.
> -------------------------------------
> Facts do not cease to exist because they are ignored.
> -------------------------------------
> Typing monkeys will write all Shakespeare's works in 200yrs.Will they
> write all patents, too? :)
> -------------------------------------
> Sanity is madness put to good use.
> -------------------------------------
> I finally figured out the only reason to be alive is to enjoy it.
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/782854> (by
> email <>)
>


-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they write
all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Current Thread