## Re: Aggregate

 Subject: Re: Aggregate From: David Carlisle Date: Thu, 16 Nov 2000 12:40:29 GMT
```> 1. how can you assess an algorithm to determine its 0(n*)-ness?

In general it's hard.
But for the kind of simple sorting algorithms shown here it's normally
fairly easy, and in any case the complexity of sorting algorithms is
fairly extensively studied and so the complexity of just about any
algorithm you might want to use is listed in "appropriate" books:-)

> OK. I think my problem lies in the fact that I have never understood
> what 0(n-squared) and so on actually means.

O(n^x) means that the time taken to get the answer is proportional to
the x-th power of the size of the input.

some algorithms are constant time:

returning the first element of a list for example.
Although if badly implemented (so it reads the whole list) then it is
linear time (O(n)) as you have to make one pass through the list.

Finding the maximum element is linear, you _have_ to make a pass through
all elements of the list to find the largest, so unlike the first case
there isn't an improvement you can make here. This shows that there are
two rather different things the complexity of the algorithm used in the
implementation and the theoretical "best" complexity possible to solve
the problem.

Also note that this is essentially "limiting behaviour" for large data
sets. If you have a simple algorithm that is O(n^2) (so takes four times
as long given twice as much data) this might be faster in practice than
an O(n log n) algotithm if the "faster" algorithm requires setting up
complicated data structures that for simple cases take longer to set up
than to use.

so finding the maximum element is at best O(n) You make one pass
through the list, each time carrying the largest-so-far with you until
you get to the end. That's what the recursive walk through the node set
does.

Sorting the entire list and then taking the last element depends on the
complexity of the sorting algorithm used but  it must be slower than
linear and should be not as slow as n-squared). This might end up being
quicker if the system has some optimisations for xsl:sort that it would
lose in the "by hand" recursion. But in the end the better complexity
will win.

n   n log n    n * n

1      0          1
2      2          4
10     10        100
100    200      10000
1000    300    1000000
10000   4000  100000000

so, if you are using an n-squared algorithm when you could be using an
order n one, for lists of 10 or so the "theoretical" cost is that it
should be 10 times slower, but with bigger setup costs and trick
compiler optimisations it might end up being faster.
But if your lists are 10000 long then the basic cost of the
algorithm is that it is going to take you 10000 times longer, and
chances are startup costs are going to be insignificant.

The reason why in[not(parent::TIME/in &gt; .)] is O(n^2) is basically
for each node in the list you are comparing with every node in the list.
so the number of comparisons goes up by the square of the list length.

Not every algorithm is order a power of n.

If for example you could figure out a way to factorise a number in time
that was limited by a power (any power) of the number of digits in the
number then you would be (a) famous and (b) getting visits from your
country's spying authorities wanting you to help them crack public key
coding systems which mostly work on the principle that multiplying
numbers together has complexity proportional to the size of the numbers
being multiplied, but reversing the process by known algorithms is
exponential. But the theoretical best complexity isn't known:
look in google for NP-complete or "travelling salesman" for more on that
than you ever want to know:-)

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet delivered
through the MessageLabs Virus Control Centre. For further information visit
http://www.star.net.uk/stats.asp

XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

```