Re: [xsl] Processing milestoned XML leads to many preceding:: calls and horrible performance

Subject: Re: [xsl] Processing milestoned XML leads to many preceding:: calls and horrible performance
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Tue, 21 Feb 2012 10:18:40 +0100
A sample of the input XML would help, and don't assume everybody knows
what "to milestone" means - it isn't even a verb in the English
language.

-W


On 21/02/2012, MatDj Cepl <mcepl@xxxxxxxxxx> wrote:
> Hi,
>
> I am again working on a XSLT stylesheet to convert a Czech Bible
> translation from home-brew schema to OSIS and I got to some performance
> problems.
>
> Whole stylesheet is
> https://gitorious.org/sword/czekms-csp_bible/blobs/master/CEP2OSIS.xsl
> (and git repo can be clone from ...), but I believe the relevant parts are
>
>      <xsl:template name="genRef">
>          <xsl:variable name="refKniha" select="//kniha[1]/@jmeno"/>
>          <xsl:variable name="refKapitola" select="preceding::kap[1]/@n"/>
>          <xsl:value-of select="concat($refKniha,'.',$refKapitola,'.')"/>
>      </xsl:template>
>
>      <xsl:template name="endVerse">
>          <xsl:param name="rBase" />
>          <xsl:element name="verse">
>              <xsl:variable name="prevVerseID">
>                  <xsl:value-of select="./preceding::vers[1]/@n" />
>              </xsl:variable>
>              <xsl:attribute name="eID">
>                  <xsl:value-of select="concat($rBase,$prevVerseID)" />
>              </xsl:attribute>
>          </xsl:element>
>      </xsl:template>
>
>      <!-- ... -->
>
>      <xsl:template match="vers">
>          <xsl:variable name="refBase">
>              <xsl:call-template name="genRef" />
>          </xsl:variable>
>          <xsl:variable name="refID" select="concat($refBase,./@n)" />
>          <!-- Find out whether this is a first verse in a chapter;
> notice that <kap/> element is milestoned as well,
>          so we have to count a distance in <verse/> elements from it,
> rather than use plain count() -->
>          <xsl:variable name="curPos"
>
>
select="count(./preceding::kap[1]/following::*[not(count(preceding-sibling::v
ers|current())
> = count(preceding-sibling::vers))])" />
>          <xsl:if test="not($curPos=1)">
>              <xsl:call-template name="endVerse">
>                  <xsl:with-param name="rBase">
>                      <xsl:value-of select="$refBase" />
>                  </xsl:with-param>
>              </xsl:call-template>
>          </xsl:if>
>          <xsl:element name="verse">
>              <xsl:attribute name="sID">
>                      <xsl:value-of select="$refID" />
>                  </xsl:attribute>
>              <xsl:attribute name="osisID">
>                      <xsl:value-of select="$refID" />
>                  </xsl:attribute>
>          </xsl:element>
>      </xsl:template>
>
> This works (at least as much as I was able to test it give then the
> circumstances), but the performance is absolutely dreadful. Just book of
> Genesis took almost an hour before being processed (with one core of my
> dual-core CPU being constantly at 100%).
>
> Obviously the problem is that <xsl:variable name="curPos"/>, and I read
> about how preceding* axes are horribly inefficient all over the
> Internet, but unfortunately I haven't figured out any other way how to
> do what I am doing and most laments about preceding* axes don't provide
> much hints either.
>
> The problem is (I think) in both <vers/> (that's "verse" in Czech) and
> <kap/> (that's an abbreviation of "chapter") are just milestones, so I
> have to go through all verses in whole book all the time (yes, this is
> http://www.joelonsoftware.com/articles/fog0000000319.html all over again).
>
> Any ideas? Would some other XSLT processors other than xsltproc (libxml
> 20706, libxslt 10126 and libexslt 815) I am using be able to optimize
> this somehow?
>
> Thanks a lot,
>
> MatDj
>
> --
> http://www.ceplovi.cz/matej/, Jabber: mcepl<at>ceplovi.cz
> GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
>
> P2 QQP6P>P9 P<P>P=P0QQQQQ QP> QP2P>P8P< QQQP0P2P>P< P=P5
QP>P4QQ.
>      -- Russian proverb (this time actually checked by a native
>         Russian)

Current Thread