Re: [xsl] Split into numbered files: without side-effect? (XSLT 2)

Subject: Re: [xsl] Split into numbered files: without side-effect? (XSLT 2)
From: Yves Forkl <Y.Forkl@xxxxxx>
Date: Fri, 28 Sep 2007 13:35:15 +0200
(Sorry for replying only today, I was out of office yesterday.)

Thanks to all of you for your suggestions. Based on the line of thought of Joe, Michael and David, which was to process the main input document's nodes by walking through the chunk starting elements, I have modified what Michael proposed (had to move the "counter" into the outer xsl:for-each to number the chunks, not the nodes they receive). Here is my result, which is working fine, generating 4 chunk files named chunk_01_a.txt, chunk_02_b1.txt, chunk_03_b2.txt and chunk_04_c.txt:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:xs="http://www.w3.org/2001/XMLSchema";
xmlns:saxon="http://saxon.sf.net/";
extension-element-prefixes="saxon"
exclude-result-prefixes="xs saxon">
<xsl:output method="text" indent="yes"/>


<xsl:variable name="root" select="/root"/>

  <xsl:template match="/">
    <xsl:for-each select="document('chunk_starting_elements.xml')
      /chunks/element">
      <xsl:variable name="path" select="string(.)"/>
      <xsl:variable name="chunk-number" select="position()"/>
      <xsl:for-each select="$root/saxon:evaluate(concat('//', $path))">
        <xsl:result-document
          href="{concat('chunk_', format-number($chunk-number, '00'),
          '_', replace($path, '^.*/', ''), '.txt')}">
          <xsl:apply-templates/>
        </xsl:result-document>
      </xsl:for-each>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

Including the descendant-or-self axis in the path allows me to match only on "path ends" of the chunk elements, instead of having to give their full path in the config file. <xsl:apply-templates/> instead of <xsl:copy-of select="."/> is fine because I just need the context node's text.

This solution, however, raises some questions:

1) I'm still "cheating" somehow, by using saxon:evaluate. If I wanted to do without, which way would I have to go, roughly? (Is this related to Joe's comment "if the full XPaths are stored in the config file then some extension is needed to evaluate these dynamically"?)

2) David suggested grouping a chunk's nodes with xsl:for-each-group, instead of constructing the chunks with two nested xsl:for-each instructions as above. I suspect I would use select="*" on the main input, but how to express the relationship between nodes and their chunks in the group-starting-with attribute, given that it must evaluate to a pattern?

3) Alongside the chunk files, I also need to send all of the contents into the main output file. Adding another <xsl:apply-templates/> before the end of the template sure is easy enough for this. But I will be doing lots of things with each node before sending it to one of the files. Consequently, I would prefer to avoid processing the full document twice by pushing its tree into memory practically twice (assuming that each node goes into exactly one chunk). Which strategy to adopt for optimizing the code, when <xsl:apply-templates/> is replaced with intensive computations?


BTW, I have tried to fit in Wendell's line (correcting some minor typos):


<xsl:variable name="chunk-number"
 select="count(preceding-sibling::*[exists(my:chunk-started-by(.))]) + 1"/>

But it seems that this does not count correctly because chunks that were started earlier need not correspond to preceding siblings only.

Yves

Current Thread