Re: [xsl] XSLT3 - Streaming + Recursive File Output

Subject: Re: [xsl] XSLT3 - Streaming + Recursive File Output
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 12 Aug 2016 11:02:32 -0000
> On 12 Aug 2016, at 11:23, Mailing Lists Mail daktapaal@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Dr. Kay.
> Thank you for your explanation. This is my first ever streaming stylesheet
and your explanations are very educational to me. I have some questions.
>

> In your point A, you said we can switch off the multi Threading in the
result document. How do we do that?
>

You can switch off multi-threading globally as a configuration option e.g.
from the command line

--allow-multithreading:off

(note two initial hyphens)

Alternatively, write

<xsl:result-document .... saxon:asynchronous="no"
xmlns:saxon="http://saxon.sf.net/";>

to switch it off for a specific xsl:result-document instruction.

> In point B, foreach , you typed idiv .. should it be div ? is it a typo or
is there a new operator called idiv
>
>
Introduced in XPath 2.0, idiv does integer division. So elements 1 to 10000
have grouping key 0, 10001 to 20000 have grouping key 1, etc.
> Point c. Changing initial unnamed template to streamable produced no
results. No files generated. Also in the examples given in the spec i did not
see any mode on the initial template
>
>
We would need to see how you are invoking the transformation. Sorry, I now see
there is an xsl:stream instruction inside the match="/" template, so
presumably you are supplying a dummy source document, which of course doesn't
need to be streamed. Normally I use a named template entry point for such
stylesheets. XSLT 3.0 recognizes <xsl:template name="xsl:initial-template"/>,
and in Saxon you can then use -it (with no template name) to select this as
the entry point, avoiding the need for a dummy source document.

Michael Kay
Saxonica
> Thank you Michael for your insights .. i have learned a lot by asking the
question.
>
> Dak
>
>
> On Aug 11, 2016 7:13 PM, "Michael Kay mike@xxxxxxxxxxxx
<mailto:mike@xxxxxxxxxxxx>" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
> (A) don't equate xsl:fork with multi-threading. In fact, the current
implementation of xsl:fork in Saxon is not multi-threaded (xsl:result-document
might be, but you can switch it off). (Saxon's streamed processing uses a push
model, which complicates many things, but pushing parser events to multiple
consumers doesn't require multitple threads).
>
> (B) I think your recursive named template can be replaced with a streamable
call on xsl:for-each-group, something like
>
> <xsl:for-each-group select="*:species" group-adjacent="(position()-1) idiv
1000">
>   <xsl:result-document href="species{position()}.xml">
>     <species><xsl:copy-of select="current-group()"/></species>
>   </xsl:result-document>
> </xsl:for-each-group>
>
> Compared with your approach, this solution has the advantage of not imposing
an arbitrary limit on the number of elements to be processed.
>
> (C) I would expect the initial unnamed mode should be streamable.
>
> (D) In the latest XSLT 3.0 we've provided "streamable stylesheet functions"
- not yet implemented in Saxon - but we stopped short at streamable named
templates. But you couldn't do this kind of batching using streamable
stylesheet functions either. A human reader can see in your code that the Nth
recursive call of the template is always processing nodes that are later in
document order than the (N-1)th recursive call, but it would require a
phenomenal amount of analysis for a theorem-prover to establish that during
static analysis, and even if you could prove it streamable, generating a
streamable execution plan would be far from trivial.
>
> Michael Kay
> Saxonica
>
>
> > On 11 Aug 2016, at 23:07, Mailing Lists Mail daktapaal@xxxxxxxxx
<mailto:daktapaal@xxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
> >
> > Dear All,
> > I have the following problem to solve using XSLT3 Streaming , which I
> > have been trying for some time now and i find a road block no matter
> > which way I choose. Seems to be an interesting issue to solve, which
> > when resolved, will be a very good learning for me.
> >
> > I have a HUGE XML ( obviously a starting point for XSlt3 Streaming)
> >
> > I am using : SaxonEE9-7-0-7J
> >
> > Problem Definition
> >
> > 1. Remove a set of nodes(Species) from the source
> > tree(UniverseKingdom.xml), which can be  around 1000,000
> > 2. Create a File called UniverseKingdom-without-species.xml which has
> > every element in UniverseKingdom, except the Species nodes
> > 3. Create batches of 1000 species and throw them out into
> > AnimalKingdomSpeciesBatch1.xml and so on and so forth till all the
> > Species are covered.
> >
> > So when the Program runs, I get
> > 1. UniverseKingdom-without-species.xml  and 1000 files , each with
> > 1000 Species, with appropriate file names
> > AnimalKingdomSpeciesBatch1.xml ... to
> > AnimalKingdomSpeciesBatch1000.xml
> >
> > What I did so far ( after many attempts and which I thought should
> > work  but did not work )
> > <xsl:stylesheet version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform
<http://www.w3.org/1999/XSL/Transform>"
> >    xmlns:xs="http://www.w3.org/2001/XMLSchema
<http://www.w3.org/2001/XMLSchema>">
> >    <xsl:mode name="stream" streamable="yes" on-no-match="shallow-copy"/>
> >    <xsl:strip-space elements="*"/>
> >    <xsl:output method="xml" indent="yes"/>
> >    <xsl:template match="/">
> >        <xsl:result-document
href="output\UniverseKingdom-without-species.xml">
> >            <xsl:stream href="UniverseKingdom.xml">
> >                <xsl:fork>
> >                    <xsl:sequence>
> >                        <xsl:apply-templates mode="stream"/>
> >                    </xsl:sequence>
> >                    <xsl:sequence>
> >                        <xsl:for-each
> > select="*:UniverseKingdom/*:AnimalKingdom">
> >                              <!-- Call Recursive Templates here -->
> >                            <xsl:call-templates
name="batch-animal-species"/>
> >                        </xsl:for-each>
> >                    </xsl:sequence>
> >                </xsl:fork>
> >            </xsl:stream>
> >        </xsl:result-document>
> >    </xsl:template>
> >    <xsl:template name="batch-animal-species">
> >        <xsl:param name="limit" select="1000000"/>
> >        <xsl:param name="batch" select="1"/>
> >        <xsl:param name="start" select="1"/>
> >        <xsl:param name="end" select="1000"/>
> >        <xsl:if test="$start &lt;= $limit ">
> >            <xsl:result-document
> > href="output\AnimalKingdomSpeciesBatch{$batch}-.xml">
> >                <species>
> >                    <xsl:for-each select="*:species[position() =
> > ($start to $end) ]">
> >                        <species>
> >                            <xsl:copy-of select="."/>
> >                        </species>
> >                    </xsl:for-each>
> >                </species>
> >            </xsl:result-document>
> >            <xsl:call-template name="batch-animal-species">
> >                <xsl:with-param name="batch" select="$batch+1"/>
> >                <xsl:with-param name="start" select="$end+1"/>
> >                <xsl:with-param name="end" select="$end+1000"/>
> >            </xsl:call-template>
> >        </xsl:if>
> >    </xsl:template>
> >    <xsl:template match="*:species" mode="stream"/>
> > </xsl:stylesheet>
> >
> >
> > Here, the issue was with the template batch-animal-species . Saxon
> > Throws Error :
> >
> > e:\perf\xslt3>java  -jar saxon9ee.jar   str.xml splitter.x
> > sl  -o:StreamAni.xml
> > Static error at xsl:template on line 22 column 91 of splitter.xsl:
> >  XTSE3430: Template rule is declared streamable but it does not
> > satisfy the streamability rules.
> >  * Operand . of CallTemplate#batch-animal-species selects streamed nodes
in a
> > context
> >  that allows arbitrary navigation (line 43)
> > Errors were reported during stylesheet compilation
> >
> >
> > I know that the logic for chunking various batched files could be made
> > better or even questionable.. But I was not expecting that the
> > Call-Template will fail.
> >
> > I am hoping some ninja warriors of XSLT3 can help me with this issue//
> > Seriously can not take No for an answer :) a lot is dependent on this
> > ...
> >
> > Also, if someone can think of an intelligent way for me to get this
> > done with a smarter code, and possibly without using fork( there is a
> > admin sitting somewhere in the System who has asked us to create code
> > without the multiple threads. He wants to be responsible for the
> > number of threads and discourages people from spawning multiple
> > threads. If not possible, then I will enforce that forking has to be
> > done.)
> > Please help ...
> > Dak.Tap
> >
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <-list/293509> (by email <>)

Current Thread