Re: [xsl] XQuery/XPath 3.1: Node List to Node Set ("distinct nodes")

Subject: Re: [xsl] XQuery/XPath 3.1: Node List to Node Set ("distinct nodes")
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 29 Dec 2021 18:27:11 -0000
On 29.12.2021 17:36, Dimitre Novatchev dnovatchev@xxxxxxxxx wrote:


On Wed, Dec 29, 2021 at 12:21 AM Martin Honnen martin.honnen@xxxxxx
<mailto:martin.honnen@xxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:


Am 29.12.2021 um 00:32 schrieb Dimitre Novatchev dnovatchev@xxxxxxxxx <mailto:dnovatchev@xxxxxxxxx>:




Hit Send too early:

    Do notice: this seems the only solution of all presented so far,
    that preserves the original sequence order (not document order) of
    the nodes.

Why is the original sequence order preserved? https://www.w3.org/TR/xpath-functions/#func-distinct-values <https://www.w3.org/TR/xpath-functions/#func-distinct-values> clearly says

    "The function returns the sequence that results from removing
    from|$arg|all but one of a set of values that are considered equal
    to one another. [...]

    The order in which the sequence of values is returned
    isB7implementation-dependentB7
    <https://www.w3.org/TR/xpath-functions/#implementation-dependent>.

    Which value of a set of values that compare equal is returned
    isB7implementation-dependentB7
    <https://www.w3.org/TR/xpath-functions/#implementation-dependent>."


So while


B B $nodes ! generate-id(.)

    gives you the generated ids in the order of the nodes in $nodes
    after the call to distinct-values there is no order defined, it is
    implementation dependent.


@Martin Honnen <mailto:Martin.Honnen@xxxxxx>B Could you, please, give us an example of an existing XPath engine whose implementation of `distinct-values()` produces its results in any other order than their original order in the input sequence?

I don't have to know one, I just pointed out that the spec doesn't guarantee the order. Thus I don't see why, given the spec, one should expect any implementation to preserve the order.

Imagine you implement distinct-values in .NET with e.g.
https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.distinct?v
iew=net-6.0
and it would probably pass all tests but also only give a "result
sequence" that " is unordered".

Aren't there also implementations of XQuery or XPath that exploit
parallel processing? I could imagine such an implementation to easily
not care about ordering if the spec allows it for distinct-values.

It seems, on the other hand, eXide of eXist-db in the online version
doesn't even grok some of the generate-id based attempts:

let $nodes := (1 to 10) ! parse-xml-fragment('<node>' || . ||
'</node>')/node(),
    $nodes := (1 to 5) ! $nodes,
    $ids := distinct-values($nodes ! generate-id(.))
return  $ids ! (function($id) {$nodes[generate-id(.) eq $id][1]})(.)

gives <node>1</node>



Any eXist-db users reading here? Is there a known issue with generate-id?

Current Thread