## O(n) notation (and character padding)

 Subject: O(n) notation (and character padding) From: "Dave Gomboc" Date: Thu, 16 Nov 2000 15:05:33 -0700
> Jeni Tennison wrote:
[snip]
> OK. I think my problem lies in the fact that I have never understood
> what 0(n-squared) and so on actually means. Presumably this is
> something that is taught in beginners' computer-science classes, but
> never having attended one I'm still in the dark. My questions are:
>
> 1. how can you assess an algorithm to determine its 0(n*)-ness?
> 2. what implications does that have for methods to use in XSLT/XPath?
[snip]

O(n^2) and such are claims of worst-case asymptotic complexity: the run-time
(or sometimes the run-space) of an algorithm as the problem size increases
without bound.  Generally speaking, the lower the asymptotic complexity, the
better the algorithm, though special cases do exist.

Nate Austin posted some code to the list recently that pads a string to a
desired size.  The algorithm takes time linear in the length of the desired
string.  Dave Pawson has it at http://www.dpawson.co.uk/xsl/padding.html
Here's the recursive step, though:

<xsl:with-param name="length" select="\$length"/>
</xsl:call-template>

Each iteration adds one character.  Assuming that the input string is
smaller than the desired result, it will take "length(result) -
length(original)" iterations to complete:

Example input string: "test"

Desired length after padding    recursions required
10                            6
100                           96
1000                          996
10000                         9996
100000                        99996
1000000                       999996

The time required for the algorithm to execute depends directly on the
number of recursions required, which depends in a linear fashion on the
desired length.

Now, here's some code I've written that also does string padding (it has a
few extra features, but it's also bulkier, even after compensating for the
verbose names ;-)

<xsl:param name="desired_length"/>
<xsl:param name="string_to_repeat"/>
<xsl:param name="truncate_left_side"/>

<xsl:choose>
<xsl:when test="string-length(\$string_to_repeat) &lt;
\$desired_length">
<xsl:with-param name="desired_length" select="\$desired_length"/>
<xsl:with-param name="string_to_repeat"
select="concat(\$string_to_repeat, \$string_to_repeat)"/>
<xsl:with-param name="truncate_left_side"
select="\$truncate_left_side"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:choose>
<xsl:when test="\$truncate_left_side">
<xsl:value-of select="substring(\$string_to_repeat,
string-length(\$string_to_repeat) - \$desired_length + 1,
\$desired_length)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="substring(\$string_to_repeat, 1,
\$desired_length)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

<xsl:param name="desired_length"/>
<xsl:param name="string_to_repeat" select="'&#160;'"/>

<xsl:variable name="current_length"
<xsl:choose>
<xsl:when test="\$current_length &lt; \$desired_length">
select="\$desired_length - \$current_length"/>
<xsl:with-param name="desired_length"
<xsl:with-param name="string_to_repeat"
select="\$string_to_repeat"/>
<xsl:with-param name="truncate_left_side"
</xsl:call-template>
</xsl:variable>
<xsl:choose>
</xsl:when>
<xsl:otherwise>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise>
<xsl:choose>
\$desired_length)"/>
</xsl:when>
<xsl:otherwise>
\$desired_length)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

The equivalent section to Nate's code is this:

<xsl:with-param name="desired_length" select="\$desired_length"/>
<xsl:with-param name="string_to_repeat"
select="concat(\$string_to_repeat, \$string_to_repeat)"/>
<xsl:with-param name="truncate_left_side" select="\$truncate_left_side"/>
</xsl:call-template>

You can see here that instead of adding a single character each time, I
double the padding string's length each time.  A similar table:

Example input string: "test"

(Nate's)
Desired length   recursions required    recursions required
10                   6                     2
100                  96                     7
1000                 996                    10
10000                9996                    14
100000               99996                    17
1000000              999996                    20

Complexity: O(n)     Complexity: O(log n)

The latter algorithm takes time logarithmic in the length of the desired
string: log_base_2 of 1000000 is about 20.  Of course, this doesn't tell the
whole story.  The latter algorithm needs to do some cleanup work when the
wanted size is overshot.  True, that cleanup time would be dwarfed by the
difference in time it takes to pad to a large size.

In this case, asymptotic complexity probably isn't that important: how many
people do you know who pad strings to be a million characters long?  If all
you're doing is padding strings to length 10, you're better off benchmarking
the algorithms (and Nate's is shorter, so my guess is that it'd be quicker).
On the other hand, if you're padding to 512 characters, then it's not even
worthwhile to benchmark, because it's clear from the asymptotic analysis
that the O(log n) algorithm is going to be faster.

I don't mean to pick on Nate, so to alleviate any such impression, I'll
point out that Mike Kay does things similarly to Nate on page 618 of XSLT
Programmer's Reference.  True, 64 isn't large.  (Notwithstanding that,
here's hoping that I will now enjoy the privilege of having Mike poke holes
into my XSLT code for years to come! ;-)

Returing to the questions:

> 1. how can you assess an algorithm to determine its 0(n*)-ness?

You have to analyse what operations it is performing and how frequently it
is performing them.  A for-each through a list, followed by a for-each
through the same list, is going to take O(n^2) time, assuming what happens
inside the nested loops takes constant time (= O(1)).  If what happens
inside the loops takes O(log n) time, then the total time would be
O(n^2 log n).  And so on.

Picking up a good book on algorithms is very worthwhile -- after digesting
its contents you'll be able to quickly recognise common complexity bounds
when you see a piece of code.

> 2. what implications does that have for methods to use in XSLT/XPath?

If you know the complexity of each XSLT or XPath operation, you can
calculate the complexity of a piece of code.  If you have a task that takes
a long time, you can try to identify what is causing it to take so long, and
see if you can rewrite stuff in such a way that the complexity is reduced.
Let's face it: you can do things anyway you like if your data source is
50Kb, but when you're processing a 50Mb file, the algorithm's efficiency can
well make the difference between the program taking minutes or days.

Dave

XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list