Subject: RE: [xsl] How to make this script faster From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Thu, 15 Nov 2007 23:00:27 -0000 |
> From just looking at your stylesheet, I noticed a couple of > things, but I don't know whether changes will make it faster. I noticed a few stylistic things too. I hate the verbosity of > > <xsl:element name="section"> > > <xsl:attribute name="ref" select="$extract-section"/> > > <xsl:attribute name="name" select="normalize-space($section-name)"/> when you could write <section ref="{$extract-section}" name="{normalize-space($section-name)}" But that's not a performance issue, and nor are most of the points Abel made; and I have to say I couldn't see anything at all here that should cause performance problems. Abel might be right about the regular expression - innocent-looking regexes can sometimes catch you out - but this one looks as if it will give a no-match on most input lines very quickly with no backtracking needed. So, let's have some data: * what processor/version are you using? * how are you running it? * what's the size of the input data? * how long is it actually taking? Michael Kay http://www.saxonica.com/ > You didn't specify what processor you use. If you use > AltovaXML, it can at times be extremely slow (exponential > performance) and it is worthwhile to try your code with a > more optimized processor like Saxon. > > From the code I notice that you use XSLT 2.0, which can > usually be more easily optimized than XSLT 1.0, both in code > (tail recursion and using "as" attributes to specify result > types) and in the processor, because the language allows for > easier optimizations of common tasks (like regular > expressions instead of recursive templates). > > But you still seem to use a lot of XSLT 1.0 techniques where > I would prefer the 2.0 version. Consider putting your > xsl:call-template (named > templates) in an xsl:function (even recursively). Consider using > if(value) then .... else ... instead of xsl:if or xsl:choose. > Consider using matching templates instead of xsl:when etc, > which may perform faster. > > But your main points of performance penalties lie in the fact > of passing on the following-sibling axis and walking it one > by one. You can do this same trick with matching templates > alone, and you are probably better off using keys to optimize > performance, or to introduce a for-each or a for-each-group. > Anything is better than the recursive named template. > > If that does not improve things, you should have a look at > some of the backtracking problems your regular expression > will cause. The regular expression parser used by Saxon is > the same as the one from Java and it has quite a bad > performance when it comes to quadratic backtracking (of the > form: (x+)+). I haven't looked into it enough, but if you can > rewrite it for less backtracking, or optimize the regex to > match the most common situation, or even pass it on in a > doubly nested (awkward, I know, but hey you are optimizing > for speed) xsl:analyze-string then you may profit a lot for speed. > > It is hard to predict the behavior of a regular expression. I > once made a very simple regular expression for matching CSV > records which took exponential performance when the overall > match for the CSV line failed (i.e., non-matching quote > pair). This regex took about 1.5 hour for a string of 60 > characters (and it doubled for each extra 3 characters, this > regex is somewhere on the Saxon list)! Rewriting it for less > backtracking improved the performance to linear. > > If the regex is indeed the problem (test is with something > straightforward) then I suggest you read the regex optimizing > chapter in Jeffrey Friedl's now famous book on regular expressions. > > HTH, > Cheers, > -- Abel Braaksma > > PS: not all hints above will necessarily or predictably > improve performance > PPS: you do not need the namespace for the XPath functions, > after all, for some functions you do use the fn: prefix, for > others you don't... > You can just leave it out. > > > Mathieu Malaterre wrote: > > Hi there, > > > > I have a working version of an XSLT script: > > http://gdcm.svn.sourceforge.net/viewvc/gdcm/Sandbox/xslt/2/ > > > > See (*) and (**). What I would like to do is : > > > > 1. Be able to run the xslt in one pass. For now I have to > run it with > > <xsl:param name="extract-section" select="'C.1'"/> > > then edit test.xsl file, comment the line and uncomment: > > <-xsl:param name="extract-section" select="'C.2'"/> > > and so on and so forth... > > > > 2. This script is seriously *slow*. I guess runnning it in one pass > > should solve most of the issue, but if there was something obvious I > > was missing... thanks ! > > > > -Mathieu > > > > (*) > > $ cat test.xml > > <?xml version="1.0"?> > > <article> > > <para>C.1 Title 1</para> > > <para>info for section C.1</para> > > <informaltable>table1</informaltable> > > <para>C.2 Title 2</para> > > <informaltable>table2</informaltable> > > <para>info for section C.2</para> > > <para>C.2.1 Title 2.1</para> > > <para>text for section C.2.1</para> > > <para>text for section C.2.1 again</para> > > <para>C.2.2 Tile 2.2</para> > > <informaltable>table for 2.2</informaltable> > > <para>text for section C.2.2</para> > > </article> > > > > (**) > > $ cat test.xsl > > <?xml version="1.0" encoding="UTF-8"?> > > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > > xmlns:fn="http://www.w3.org/2005/xpath-functions" version="2.0"> > > > > <!-- GENERAL --> > > > > <xsl:output method="xml" indent="yes" encoding="UTF-8"/> > > > > <!-- number of the sample section to be extracted --> > > <!--xsl:param name="extract-section" select="'C.1'"/--> > > <!--xsl:param name="extract-section" select="'C.2'"/--> > > <!--xsl:param name="extract-section" select="'C.2.1'"/--> > > <xsl:param name="extract-section" select="'C.2.2'"/> > > > > > > <xsl:template match="para"> > > <text> > > <xsl:value-of select="concat(.,' ')"/> > > </text> > > </xsl:template> > > > > <xsl:template match="informaltable"> > > <table> > > <xsl:value-of select="concat(.,' ')"/> > > </table> > > </xsl:template> > > > > <!-- MAIN --> > > > > <xsl:template match="/article"> > > <xsl:variable name="section-number" > select="concat($extract-section,' ')"/> > > <xsl:variable name="section-anchor" > > select="para[starts-with(normalize-space(.),$section-number)]"/> > > <xsl:variable name="section-name" > > > select="substring-after(para[starts-with(normalize-space(.),$s > ection-number)],$extract-section)"/> > > <xsl:choose> > > <xsl:when test="count($section-anchor)=1"> > > <xsl:message>Info: section <xsl:value-of > > select="$extract-section"/> found</xsl:message> > > <xsl:element name="section"> > > <xsl:attribute name="ref" select="$extract-section"/> > > <xsl:attribute name="name" > select="normalize-space($section-name)"/> > > <xsl:call-template name="copy-section-paragraphs"> > > <xsl:with-param name="section-paragraphs" > > select="$section-anchor/following-sibling::*"/> > > </xsl:call-template> > > </xsl:element> > > <xsl:message>Info: all paragraphs extracted</xsl:message> > > </xsl:when> > > <xsl:when test="count($section-anchor)>1"> > > <xsl:message>Error: section <xsl:value-of > > select="$extract-section"/> found multiple times!</xsl:message> > > </xsl:when> > > <xsl:otherwise> > > <xsl:message>Error: section <xsl:value-of > > select="$extract-section"/> not found!</xsl:message> > > </xsl:otherwise> > > </xsl:choose> > > </xsl:template> > > > > <!-- TEMPLATES --> > > > > <xsl:template name="copy-section-paragraphs"> > > <xsl:param name="section-paragraphs"/> > > <xsl:variable name="current-paragraph" > select="$section-paragraphs[1]"/> > > <!-- search for next section title --> > > <xsl:if test="($current-paragraph[name()='para' or > > name()='informaltable']) and > > > not(fn:matches(normalize-space($current-paragraph),'^([A-F]|[1 > -9]+[0-9]?)(\.[1-9]?[0-9]+)+ > > '))"> > > <!-- output current paragraph (close with a newline) --> > > <xsl:apply-templates select="$current-paragraph"/> > > <xsl:call-template name="copy-section-paragraphs"> > > <xsl:with-param name="section-paragraphs" > > select="$section-paragraphs[position()>1]"/> > > </xsl:call-template> > > </xsl:if> > > </xsl:template> > > > > </xsl:stylesheet>
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] How to make this script f, Abel Braaksma | Thread | Re: [xsl] How to make this script f, Abel Braaksma |
Re: [xsl] How to make this script f, Abel Braaksma | Date | Re: [xsl] Template matching precedi, Ilya Konanykhin |
Month |