## Re: [xsl] How the other half live

 Subject: Re: [xsl] How the other half live From: "Dimitre Novatchev" Date: Tue, 18 Nov 2008 09:28:52 -0800
```> It's actually O(n*m) where n is the number of values and m the number of
> distinct values. So it's O(n^2) in any case where the number of distinct
> values is proportional to the size of the population, which means in effect
> in any "open-ended" population.

Probably this could be a challenge to Saxon:

When dealing with the following expression:

\$vSeq[index-of(\$vSeq,.)[\$i]]

it should be possible for a good optimizer to treat it in the following way:

1. If \$i  >  count(\$vSeq)    ==>   Do nothing.   Otherwise:

2. Deduce the fact that the indices of all items will be needed.

3. Decide that from the above the indices of the distinct items will suffice

4. Decide that the needed indices can be built within a single pass
over the sequence. Do not index any non-first occurence of an item.

5. Decide that the distinct items together with their indices can be
found while doing the same single pass over the sequence.

6. While finding each distinct item, record the number of its
occurences (the length of its index-of(\$vSeq, .)   ). In this process
record each distinct item that has more than \$i occurences. Stop
producing the index after the \$ith occurence.

7. Generate code to produce every item, recorded in step 6.

It seems to me that the time complexity of this algorithm is probably
less than N*Log(N) (as no sorting is needed and hashing techniques can
be used).

Also, this algorithm can be used well in streaming mode.

--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play

On Tue, Nov 18, 2008 at 7:58 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>> > They are both O(n^2).
>>
>> Only in the worst case though isn't it, which is a list of
>> unique values?
>
> It's actually O(n*m) where n is the number of values and m the number of
> distinct values. So it's O(n^2) in any case where the number of distinct
> values is proportional to the size of the population, which means in effect
> in any "open-ended" population.
>
> Michael Kay
> http://www.saxonica.com/

```
Current Thread