Subject: O(n) notation (and character padding) From: "Dave Gomboc" <dave@xxxxxxxxxxxxxx> Date: Thu, 16 Nov 2000 15:05:33 -0700 |
> Jeni Tennison wrote: [snip] > OK. I think my problem lies in the fact that I have never understood > what 0(n-squared) and so on actually means. Presumably this is > something that is taught in beginners' computer-science classes, but > never having attended one I'm still in the dark. My questions are: > > 1. how can you assess an algorithm to determine its 0(n*)-ness? > 2. what implications does that have for methods to use in XSLT/XPath? [snip] O(n^2) and such are claims of worst-case asymptotic complexity: the run-time (or sometimes the run-space) of an algorithm as the problem size increases without bound. Generally speaking, the lower the asymptotic complexity, the better the algorithm, though special cases do exist. Nate Austin posted some code to the list recently that pads a string to a desired size. The algorithm takes time linear in the length of the desired string. Dave Pawson has it at http://www.dpawson.co.uk/xsl/padding.html Here's the recursive step, though: <xsl:call-template name="append-pad"> <xsl:with-param name="padChar" select="$padChar"/> <xsl:with-param name="padVar" select="concat($padVar,$padChar)"/> <xsl:with-param name="length" select="$length"/> </xsl:call-template> Each iteration adds one character. Assuming that the input string is smaller than the desired result, it will take "length(result) - length(original)" iterations to complete: Example input string: "test" Desired length after padding recursions required 10 6 100 96 1000 996 10000 9996 100000 99996 1000000 999996 The time required for the algorithm to execute depends directly on the number of recursions required, which depends in a linear fashion on the desired length. Now, here's some code I've written that also does string padding (it has a few extra features, but it's also bulkier, even after compensating for the verbose names ;-) <xsl:template name="padding_string"> <xsl:param name="desired_length"/> <xsl:param name="string_to_repeat"/> <xsl:param name="truncate_left_side"/> <xsl:choose> <xsl:when test="string-length($string_to_repeat) < $desired_length"> <xsl:call-template name="padding_string"> <xsl:with-param name="desired_length" select="$desired_length"/> <xsl:with-param name="string_to_repeat" select="concat($string_to_repeat, $string_to_repeat)"/> <xsl:with-param name="truncate_left_side" select="$truncate_left_side"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:choose> <xsl:when test="$truncate_left_side"> <xsl:value-of select="substring($string_to_repeat, string-length($string_to_repeat) - $desired_length + 1, $desired_length)"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="substring($string_to_repeat, 1, $desired_length)"/> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="pad_string"> <xsl:param name="string_to_pad"/> <xsl:param name="desired_length"/> <xsl:param name="string_to_repeat" select="' '"/> <xsl:param name="pad_left_side" select="true()"/> <xsl:variable name="current_length" select="string-length($string_to_pad)"/> <xsl:choose> <xsl:when test="$current_length < $desired_length"> <xsl:variable name="desired_padding_length" select="$desired_length - $current_length"/> <xsl:variable name="padding_string"> <xsl:call-template name="padding_string"> <xsl:with-param name="desired_length" select="$desired_padding_length"/> <xsl:with-param name="string_to_repeat" select="$string_to_repeat"/> <xsl:with-param name="truncate_left_side" select="$pad_left_side"/> </xsl:call-template> </xsl:variable> <xsl:choose> <xsl:when test="$pad_left_side"> <xsl:value-of select="concat($padding_string, $string_to_pad)"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="concat($string_to_pad, $padding_string)"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:choose> <xsl:when test="$pad_left_side"> <xsl:value-of select="substring($string_to_pad, string-length($string_to_pad) - $desired_length + 1, $desired_length)"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="substring($string_to_pad, 1, $desired_length)"/> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template> The equivalent section to Nate's code is this: <xsl:call-template name="padding_string"> <xsl:with-param name="desired_length" select="$desired_length"/> <xsl:with-param name="string_to_repeat" select="concat($string_to_repeat, $string_to_repeat)"/> <xsl:with-param name="truncate_left_side" select="$truncate_left_side"/> </xsl:call-template> You can see here that instead of adding a single character each time, I double the padding string's length each time. A similar table: Example input string: "test" (Nate's) Desired length recursions required recursions required 10 6 2 100 96 7 1000 996 10 10000 9996 14 100000 99996 17 1000000 999996 20 Complexity: O(n) Complexity: O(log n) The latter algorithm takes time logarithmic in the length of the desired string: log_base_2 of 1000000 is about 20. Of course, this doesn't tell the whole story. The latter algorithm needs to do some cleanup work when the wanted size is overshot. True, that cleanup time would be dwarfed by the difference in time it takes to pad to a large size. In this case, asymptotic complexity probably isn't that important: how many people do you know who pad strings to be a million characters long? If all you're doing is padding strings to length 10, you're better off benchmarking the algorithms (and Nate's is shorter, so my guess is that it'd be quicker). On the other hand, if you're padding to 512 characters, then it's not even worthwhile to benchmark, because it's clear from the asymptotic analysis that the O(log n) algorithm is going to be faster. I don't mean to pick on Nate, so to alleviate any such impression, I'll point out that Mike Kay does things similarly to Nate on page 618 of XSLT Programmer's Reference. True, 64 isn't large. (Notwithstanding that, here's hoping that I will now enjoy the privilege of having Mike poke holes into my XSLT code for years to come! ;-) Returing to the questions: > 1. how can you assess an algorithm to determine its 0(n*)-ness? You have to analyse what operations it is performing and how frequently it is performing them. A for-each through a list, followed by a for-each through the same list, is going to take O(n^2) time, assuming what happens inside the nested loops takes constant time (= O(1)). If what happens inside the loops takes O(log n) time, then the total time would be O(n^2 log n). And so on. Picking up a good book on algorithms is very worthwhile -- after digesting its contents you'll be able to quickly recognise common complexity bounds when you see a piece of code. > 2. what implications does that have for methods to use in XSLT/XPath? If you know the complexity of each XSLT or XPath operation, you can calculate the complexity of a piece of code. If you have a task that takes a long time, you can try to identify what is causing it to take so long, and see if you can rewrite stuff in such a way that the complexity is reduced. Let's face it: you can do things anyway you like if your data source is 50Kb, but when you're processing a 50Mb file, the algorithm's efficiency can well make the difference between the program taking minutes or days. Dave XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Problem Accessing Nodes, Mangano, Chris | Thread | Re: O(n) notation (and character pa, Jeni Tennison |
RE: Equivalent of a Global Counter, Matthew Bentley | Date | XSLT link to schema, Messineo, Chris |
Month |