Re: [xsl] Best approach for writing an XML log whilst processing/writing other XML documents?

Subject: Re: [xsl] Best approach for writing an XML log whilst processing/writing other XML documents?
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Fri, 13 Aug 2010 14:42:40 +0100
It's a real problem, I'm afraid. Producing two outputs from a single pass of the input is something we've been struggling with in doing the streaming work for XSLT 2.1, and I don't think we have a general answer to the problem yet. It's possible using XSLT 2.1 xsl:iterate where the input process is a simple loop (I demonstrated this in my streaming demo at Balisage last week), but not really for the case where you're processing the input using recursive descent.

I think the answer is, bite the bullet and process it twice. Of course, you don't need to replicate the traversal code: you can parameterize it. For example, set a tunnel parameter to either <pass1/> or <pass2/> and do an apply-templates on this parameter when you want to do different things in the two passes.

Michael Kay

On 13/08/2010 14:29, Fabre Lambeau wrote:

I'm facing a problem for which I have found no elegant solution so far.

In summary, I have to process a number of XML files, replace some data
in them and re-write these files into copies. Each XML file can
reference other XML files that also need processing in the same way.
There is no limit to the depth of the dependency links, but it is
guaranteed to be a tree, not a graph (ie. no cycles leading to
infinite processing).

Here is a simplified example:

== doc1.xml  (the starting point, input to the XSLT process) ==
     <node replace="id1">text1</node>
     <node>another piece of text</node>
     <ext-doc href="child-doc2.xml"/>
     <node replace="id2">text2</node>

== child-doc2.xml ==
     <node replace="id3">text3</node>
     <ext-doc href="child-doc3.xml"/>

== child-doc3.xml ==
     <node replace="id4">text4</node>

The aim is to replace the content of //node[exists(@replace)] with
some other value obtained from a lookup table (for the sake of
simplicity below, I simply prefix it with "MOD", as this lookup is not
the focus of my problem)

This is all quite simple, and I achieved it easily in XSLT2 with
xsl:result-document in a recursive fashion

== publish.xsl ==
<xsl:stylesheet xmlns:xsl="";
     <xsl:output method="xml" indent="yes"/>

<xsl:variable name="source" select="."/>

     <xsl:template match="/">
         <xsl:call-template name="copy-file-and-replace">
             <xsl:with-param name="output-name">target.xml</xsl:with-param>
             <xsl:with-param name="doc" select="$source"/>

     <xsl:template name="copy-file-and-replace">
         <xsl:param name="doc"/>
         <xsl:param name="output-name"/>

         <xsl:result-document method="xml" href="{$output-name}">
             <xsl:apply-templates select="$doc" mode="replace"/>

<!-- MODE: replace -->

     <xsl:template mode="replace" match="*[@replace]">
             <xsl:copy-of select="@* except @replace"/>
             <xsl:value-of select="text()"/>
             <xsl:apply-templates mode="#current" select="*"/>

     <xsl:template mode="replace" match="*|text()">
             <xsl:copy-of select="@*"/>
             <xsl:apply-templates mode="#current"/>

     <xsl:template mode="replace" match="ext-doc">
         <xsl:copy-of select="."/>

         <!-- recurse over that file -->
         <xsl:call-template name="copy-file-and-replace">
             <xsl:with-param name="doc" select="doc(@href)"/>
             <xsl:with-param name="output-name"
select="concat(generate-id(), @href)"/>


Now comes the problem.  As I do this processing, I need to collect
some information that allow me to report on the process, and output
the dependency tree, and what replacements were made.  I'd like this
output to be XML, and for the example above, something like

== report.xml ==
     <output file="target.xml">
         <replaced from="text1" to="MOD-text1"/>
         <output file="d1e12child-doc2.xml">
             <replaced from="text3" to="MOD-text3"/>
             <output file="d1e15child-doc3.xml">
                 <replaced from="text4" to="MOD-text4"/>
         <replaced from="text2" to="MOD-text2"/>

Unfortunately, I cannot find a way to generate the 2 in parallel (ie.
the copies of original files and the report), since creation of new
nodes in the mode='replace' templates would obviously go into the
copied files, not the report.
The only way I can think of doing is in a 2-pass algorithm, first
doing all the copying (more=replace), then going through it all again
and produce the report (mode=report), but I hope there is another way
(particularly one that avoids having to go through all dependency
files twice)

Could anyone give me a clue on this?

Fabre Lambeau

Current Thread