(Sorry for replying only today, I was out of office yesterday.)
Thanks to all of you for your suggestions. Based on the line of thought
of Joe, Michael and David, which was to process the main input
document's nodes by walking through the chunk starting elements, I have
modified what Michael proposed (had to move the "counter" into the outer
xsl:for-each to number the chunks, not the nodes they receive). Here is
my result, which is working fine, generating 4 chunk files named
chunk_01_a.txt, chunk_02_b1.txt, chunk_03_b2.txt and chunk_04_c.txt:
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
extension-element-prefixes="saxon"
exclude-result-prefixes="xs saxon">
<xsl:output method="text" indent="yes"/>
<xsl:variable name="root" select="/root"/>
<xsl:template match="/">
<xsl:for-each select="document('chunk_starting_elements.xml')
/chunks/element">
<xsl:variable name="path" select="string(.)"/>
<xsl:variable name="chunk-number" select="position()"/>
<xsl:for-each select="$root/saxon:evaluate(concat('//', $path))">
<xsl:result-document
href="{concat('chunk_', format-number($chunk-number, '00'),
'_', replace($path, '^.*/', ''), '.txt')}">
<xsl:apply-templates/>
</xsl:result-document>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Including the descendant-or-self axis in the path allows me to match
only on "path ends" of the chunk elements, instead of having to give
their full path in the config file. <xsl:apply-templates/> instead of
<xsl:copy-of select="."/> is fine because I just need the context node's
text.
This solution, however, raises some questions:
1) I'm still "cheating" somehow, by using saxon:evaluate. If I wanted to
do without, which way would I have to go, roughly? (Is this related to
Joe's comment "if the full XPaths are stored in the config file then
some extension is needed to evaluate these dynamically"?)
2) David suggested grouping a chunk's nodes with xsl:for-each-group,
instead of constructing the chunks with two nested xsl:for-each
instructions as above. I suspect I would use select="*" on the main
input, but how to express the relationship between nodes and their
chunks in the group-starting-with attribute, given that it must evaluate
to a pattern?
3) Alongside the chunk files, I also need to send all of the contents
into the main output file. Adding another <xsl:apply-templates/> before
the end of the template sure is easy enough for this. But I will be
doing lots of things with each node before sending it to one of the
files. Consequently, I would prefer to avoid processing the full
document twice by pushing its tree into memory practically twice
(assuming that each node goes into exactly one chunk). Which strategy to
adopt for optimizing the code, when <xsl:apply-templates/> is replaced
with intensive computations?
BTW, I have tried to fit in Wendell's line (correcting some minor typos):
<xsl:variable name="chunk-number"
select="count(preceding-sibling::*[exists(my:chunk-started-by(.))]) + 1"/>
But it seems that this does not count correctly because chunks that were
started earlier need not correspond to preceding siblings only.
Yves