[xsl] replace() and efficiency: troff-to-unicode conversion

Subject: [xsl] replace() and efficiency: troff-to-unicode conversion
From: David J Birnbaum <djbpitt+xml@xxxxxxxx>
Date: Tue, 12 Sep 2006 18:57:50 -0400
Dear XSL-List,

I wrote to David Carlisle directly to follow up on his suggestion about my troff-to-unicode conversion problem, and he asked me to post my response on the list. Here it is.

Best,

David
djbpitt+xml@xxxxxxxx
________

Dear David,

I've been working on implementing your suggestion that I use <xsl:analyze-string> for my troff-to-unicode conversion problem, there's a detail I don't understand, and I was wondering whether you might be able to advise. If I understood correctly, your suggestion was along the lines of:

<xsl:template match="text()">
<xsl:analyze-string select="." regex="\\\\\\(\\\?([a-z])"
  <xsl:matching-substring>
    <xsl:choose>
      <xsl:when test="regex-group(1)='s'">...
      <xsl:when test="regex-group(1)='c'">...

What I don't understand is what an XSLT 2 processor does when I pass it a text node like:

abab\(?sabab\(?cabab

There are two matches here: \(?s and \(?c . When my <xsl:choose> finds the first match (it's the first <xsl:when> within the <xsl:choose>), doesn't it just replace all instances of \(?s and then not read the rest of the <xsl:when> lines? That is, won't it fail to find the subsequent \(?c ?

One of the reasons for the slow execution time of my original program was that I needed to pass the entire text node through a series of replace() operations, one for each possible troff escape sequence, in order to match all possible troff escape sequences. If a text node were guaranteed to have only a single such escape sequence, the <xsl:choose> approach would find it, and would avoid applying the other operations vacuously. But if the text node can contain an arbitrary number of such sequences, each of which may occur one or more time, will the <xsl:choose> approach confront them all?

Thanks again for any advice or suggestions.

Best,

David

P.S. I implemented the other part of your suggestion (using <xsl:variable name="x" select="y"/> instead of <xsl:variable name="x">y</xsl:variable>, and it seems as if it did improve the efficiency a bit. Thank you!

Current Thread