|
Subject: RE: [xsl] How to make this script faster From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Thu, 15 Nov 2007 23:00:27 -0000 |
> From just looking at your stylesheet, I noticed a couple of
> things, but I don't know whether changes will make it faster.
I noticed a few stylistic things too. I hate the verbosity of
> > <xsl:element name="section">
> > <xsl:attribute name="ref" select="$extract-section"/>
> > <xsl:attribute name="name"
select="normalize-space($section-name)"/>
when you could write
<section ref="{$extract-section}"
name="{normalize-space($section-name)}"
But that's not a performance issue, and nor are most of the points Abel
made; and I have to say I couldn't see anything at all here that should
cause performance problems.
Abel might be right about the regular expression - innocent-looking regexes
can sometimes catch you out - but this one looks as if it will give a
no-match on most input lines very quickly with no backtracking needed.
So, let's have some data:
* what processor/version are you using?
* how are you running it?
* what's the size of the input data?
* how long is it actually taking?
Michael Kay
http://www.saxonica.com/
> You didn't specify what processor you use. If you use
> AltovaXML, it can at times be extremely slow (exponential
> performance) and it is worthwhile to try your code with a
> more optimized processor like Saxon.
>
> From the code I notice that you use XSLT 2.0, which can
> usually be more easily optimized than XSLT 1.0, both in code
> (tail recursion and using "as" attributes to specify result
> types) and in the processor, because the language allows for
> easier optimizations of common tasks (like regular
> expressions instead of recursive templates).
>
> But you still seem to use a lot of XSLT 1.0 techniques where
> I would prefer the 2.0 version. Consider putting your
> xsl:call-template (named
> templates) in an xsl:function (even recursively). Consider using
> if(value) then .... else ... instead of xsl:if or xsl:choose.
> Consider using matching templates instead of xsl:when etc,
> which may perform faster.
>
> But your main points of performance penalties lie in the fact
> of passing on the following-sibling axis and walking it one
> by one. You can do this same trick with matching templates
> alone, and you are probably better off using keys to optimize
> performance, or to introduce a for-each or a for-each-group.
> Anything is better than the recursive named template.
>
> If that does not improve things, you should have a look at
> some of the backtracking problems your regular expression
> will cause. The regular expression parser used by Saxon is
> the same as the one from Java and it has quite a bad
> performance when it comes to quadratic backtracking (of the
> form: (x+)+). I haven't looked into it enough, but if you can
> rewrite it for less backtracking, or optimize the regex to
> match the most common situation, or even pass it on in a
> doubly nested (awkward, I know, but hey you are optimizing
> for speed) xsl:analyze-string then you may profit a lot for speed.
>
> It is hard to predict the behavior of a regular expression. I
> once made a very simple regular expression for matching CSV
> records which took exponential performance when the overall
> match for the CSV line failed (i.e., non-matching quote
> pair). This regex took about 1.5 hour for a string of 60
> characters (and it doubled for each extra 3 characters, this
> regex is somewhere on the Saxon list)! Rewriting it for less
> backtracking improved the performance to linear.
>
> If the regex is indeed the problem (test is with something
> straightforward) then I suggest you read the regex optimizing
> chapter in Jeffrey Friedl's now famous book on regular expressions.
>
> HTH,
> Cheers,
> -- Abel Braaksma
>
> PS: not all hints above will necessarily or predictably
> improve performance
> PPS: you do not need the namespace for the XPath functions,
> after all, for some functions you do use the fn: prefix, for
> others you don't...
> You can just leave it out.
>
>
> Mathieu Malaterre wrote:
> > Hi there,
> >
> > I have a working version of an XSLT script:
> > http://gdcm.svn.sourceforge.net/viewvc/gdcm/Sandbox/xslt/2/
> >
> > See (*) and (**). What I would like to do is :
> >
> > 1. Be able to run the xslt in one pass. For now I have to
> run it with
> > <xsl:param name="extract-section" select="'C.1'"/>
> > then edit test.xsl file, comment the line and uncomment:
> > <-xsl:param name="extract-section" select="'C.2'"/>
> > and so on and so forth...
> >
> > 2. This script is seriously *slow*. I guess runnning it in one pass
> > should solve most of the issue, but if there was something obvious I
> > was missing... thanks !
> >
> > -Mathieu
> >
> > (*)
> > $ cat test.xml
> > <?xml version="1.0"?>
> > <article>
> > <para>C.1 Title 1</para>
> > <para>info for section C.1</para>
> > <informaltable>table1</informaltable>
> > <para>C.2 Title 2</para>
> > <informaltable>table2</informaltable>
> > <para>info for section C.2</para>
> > <para>C.2.1 Title 2.1</para>
> > <para>text for section C.2.1</para>
> > <para>text for section C.2.1 again</para>
> > <para>C.2.2 Tile 2.2</para>
> > <informaltable>table for 2.2</informaltable>
> > <para>text for section C.2.2</para>
> > </article>
> >
> > (**)
> > $ cat test.xsl
> > <?xml version="1.0" encoding="UTF-8"?>
> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > xmlns:fn="http://www.w3.org/2005/xpath-functions" version="2.0">
> >
> > <!-- GENERAL -->
> >
> > <xsl:output method="xml" indent="yes" encoding="UTF-8"/>
> >
> > <!-- number of the sample section to be extracted -->
> > <!--xsl:param name="extract-section" select="'C.1'"/-->
> > <!--xsl:param name="extract-section" select="'C.2'"/-->
> > <!--xsl:param name="extract-section" select="'C.2.1'"/-->
> > <xsl:param name="extract-section" select="'C.2.2'"/>
> >
> >
> > <xsl:template match="para">
> > <text>
> > <xsl:value-of select="concat(.,' ')"/>
> > </text>
> > </xsl:template>
> >
> > <xsl:template match="informaltable">
> > <table>
> > <xsl:value-of select="concat(.,' ')"/>
> > </table>
> > </xsl:template>
> >
> > <!-- MAIN -->
> >
> > <xsl:template match="/article">
> > <xsl:variable name="section-number"
> select="concat($extract-section,' ')"/>
> > <xsl:variable name="section-anchor"
> > select="para[starts-with(normalize-space(.),$section-number)]"/>
> > <xsl:variable name="section-name"
> >
> select="substring-after(para[starts-with(normalize-space(.),$s
> ection-number)],$extract-section)"/>
> > <xsl:choose>
> > <xsl:when test="count($section-anchor)=1">
> > <xsl:message>Info: section <xsl:value-of
> > select="$extract-section"/> found</xsl:message>
> > <xsl:element name="section">
> > <xsl:attribute name="ref" select="$extract-section"/>
> > <xsl:attribute name="name"
> select="normalize-space($section-name)"/>
> > <xsl:call-template name="copy-section-paragraphs">
> > <xsl:with-param name="section-paragraphs"
> > select="$section-anchor/following-sibling::*"/>
> > </xsl:call-template>
> > </xsl:element>
> > <xsl:message>Info: all paragraphs extracted</xsl:message>
> > </xsl:when>
> > <xsl:when test="count($section-anchor)>1">
> > <xsl:message>Error: section <xsl:value-of
> > select="$extract-section"/> found multiple times!</xsl:message>
> > </xsl:when>
> > <xsl:otherwise>
> > <xsl:message>Error: section <xsl:value-of
> > select="$extract-section"/> not found!</xsl:message>
> > </xsl:otherwise>
> > </xsl:choose>
> > </xsl:template>
> >
> > <!-- TEMPLATES -->
> >
> > <xsl:template name="copy-section-paragraphs">
> > <xsl:param name="section-paragraphs"/>
> > <xsl:variable name="current-paragraph"
> select="$section-paragraphs[1]"/>
> > <!-- search for next section title -->
> > <xsl:if test="($current-paragraph[name()='para' or
> > name()='informaltable']) and
> >
> not(fn:matches(normalize-space($current-paragraph),'^([A-F]|[1
> -9]+[0-9]?)(\.[1-9]?[0-9]+)+
> > '))">
> > <!-- output current paragraph (close with a newline) -->
> > <xsl:apply-templates select="$current-paragraph"/>
> > <xsl:call-template name="copy-section-paragraphs">
> > <xsl:with-param name="section-paragraphs"
> > select="$section-paragraphs[position()>1]"/>
> > </xsl:call-template>
> > </xsl:if>
> > </xsl:template>
> >
> > </xsl:stylesheet>
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] How to make this script f, Abel Braaksma | Thread | Re: [xsl] How to make this script f, Abel Braaksma |
| Re: [xsl] How to make this script f, Abel Braaksma | Date | Re: [xsl] Template matching precedi, Ilya Konanykhin |
| Month |