Re: [xsl] XSLT3 - Streaming + Recursive File Output

Subject: Re: [xsl] XSLT3 - Streaming + Recursive File Output
From: "Mailing Lists Mail daktapaal@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 12 Aug 2016 15:28:13 -0000
Thanks all. This has been a great learning experience

On Aug 12, 2016 8:06 AM, "Mailing Lists Mail" <daktapaal@xxxxxxxxx> wrote:

> idiv as in integer division
>
> On Aug 12, 2016 6:22 AM, "Mailing Lists Mail" <daktapaal@xxxxxxxxx> wrote:
>
>> Dr. Kay.
>> Thank you for your explanation. This is my first ever streaming
>> stylesheet and your explanations are very educational to me. I have some
>> questions.
>> In your point A, you said we can switch off the multi Threading in the
>> result document. How do we do that?
>> In point B, foreach , you typed idiv .. should it be div ? is it a typo
>> or is there a new operator called idiv
>>
>> Point c. Changing initial unnamed template to streamable produced no
>> results. No files generated. Also in the examples given in the spec i did
>> not see any mode on the initial template
>>
>> Thank you Michael for your insights .. i have learned a lot by asking the
>> question.
>>
>> Dak
>>
>> On Aug 11, 2016 7:13 PM, "Michael Kay mike@xxxxxxxxxxxx" <
>> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>> (A) don't equate xsl:fork with multi-threading. In fact, the current
>>> implementation of xsl:fork in Saxon is not multi-threaded
>>> (xsl:result-document might be, but you can switch it off). (Saxon's
>>> streamed processing uses a push model, which complicates many things, but
>>> pushing parser events to multiple consumers doesn't require multitple
>>> threads).
>>>
>>> (B) I think your recursive named template can be replaced with a
>>> streamable call on xsl:for-each-group, something like
>>>
>>> <xsl:for-each-group select="*:species" group-adjacent="(position()-1)
>>> idiv 1000">
>>>   <xsl:result-document href="species{position()}.xml">
>>>     <species><xsl:copy-of select="current-group()"/></species>
>>>   </xsl:result-document>
>>> </xsl:for-each-group>
>>>
>>> Compared with your approach, this solution has the advantage of not
>>> imposing an arbitrary limit on the number of elements to be processed.
>>>
>>> (C) I would expect the initial unnamed mode should be streamable.
>>>
>>> (D) In the latest XSLT 3.0 we've provided "streamable stylesheet
>>> functions" - not yet implemented in Saxon - but we stopped short at
>>> streamable named templates. But you couldn't do this kind of batching using
>>> streamable stylesheet functions either. A human reader can see in your code
>>> that the Nth recursive call of the template is always processing nodes that
>>> are later in document order than the (N-1)th recursive call, but it would
>>> require a phenomenal amount of analysis for a theorem-prover to establish
>>> that during static analysis, and even if you could prove it streamable,
>>> generating a streamable execution plan would be far from trivial.
>>>
>>> Michael Kay
>>> Saxonica
>>>
>>>
>>> > On 11 Aug 2016, at 23:07, Mailing Lists Mail daktapaal@xxxxxxxxx <
>>> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>> >
>>> > Dear All,
>>> > I have the following problem to solve using XSLT3 Streaming , which I
>>> > have been trying for some time now and i find a road block no matter
>>> > which way I choose. Seems to be an interesting issue to solve, which
>>> > when resolved, will be a very good learning for me.
>>> >
>>> > I have a HUGE XML ( obviously a starting point for XSlt3 Streaming)
>>> >
>>> > I am using : SaxonEE9-7-0-7J
>>> >
>>> > Problem Definition
>>> >
>>> > 1. Remove a set of nodes(Species) from the source
>>> > tree(UniverseKingdom.xml), which can be  around 1000,000
>>> > 2. Create a File called UniverseKingdom-without-species.xml which has
>>> > every element in UniverseKingdom, except the Species nodes
>>> > 3. Create batches of 1000 species and throw them out into
>>> > AnimalKingdomSpeciesBatch1.xml and so on and so forth till all the
>>> > Species are covered.
>>> >
>>> > So when the Program runs, I get
>>> > 1. UniverseKingdom-without-species.xml  and 1000 files , each with
>>> > 1000 Species, with appropriate file names
>>> > AnimalKingdomSpeciesBatch1.xml ... to
>>> > AnimalKingdomSpeciesBatch1000.xml
>>> >
>>> > What I did so far ( after many attempts and which I thought should
>>> > work  but did not work )
>>> > <xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1
>>> 999/XSL/Transform"
>>> >    xmlns:xs="http://www.w3.org/2001/XMLSchema";>
>>> >    <xsl:mode name="stream" streamable="yes"
>>> on-no-match="shallow-copy"/>
>>> >    <xsl:strip-space elements="*"/>
>>> >    <xsl:output method="xml" indent="yes"/>
>>> >    <xsl:template match="/">
>>> >        <xsl:result-document href="output\UniverseKingdom-w
>>> ithout-species.xml">
>>> >            <xsl:stream href="UniverseKingdom.xml">
>>> >                <xsl:fork>
>>> >                    <xsl:sequence>
>>> >                        <xsl:apply-templates mode="stream"/>
>>> >                    </xsl:sequence>
>>> >                    <xsl:sequence>
>>> >                        <xsl:for-each
>>> > select="*:UniverseKingdom/*:AnimalKingdom">
>>> >                              <!-- Call Recursive Templates here -->
>>> >                            <xsl:call-templates
>>> name="batch-animal-species"/>
>>> >                        </xsl:for-each>
>>> >                    </xsl:sequence>
>>> >                </xsl:fork>
>>> >            </xsl:stream>
>>> >        </xsl:result-document>
>>> >    </xsl:template>
>>> >    <xsl:template name="batch-animal-species">
>>> >        <xsl:param name="limit" select="1000000"/>
>>> >        <xsl:param name="batch" select="1"/>
>>> >        <xsl:param name="start" select="1"/>
>>> >        <xsl:param name="end" select="1000"/>
>>> >        <xsl:if test="$start &lt;= $limit ">
>>> >            <xsl:result-document
>>> > href="output\AnimalKingdomSpeciesBatch{$batch}-.xml">
>>> >                <species>
>>> >                    <xsl:for-each select="*:species[position() =
>>> > ($start to $end) ]">
>>> >                        <species>
>>> >                            <xsl:copy-of select="."/>
>>> >                        </species>
>>> >                    </xsl:for-each>
>>> >                </species>
>>> >            </xsl:result-document>
>>> >            <xsl:call-template name="batch-animal-species">
>>> >                <xsl:with-param name="batch" select="$batch+1"/>
>>> >                <xsl:with-param name="start" select="$end+1"/>
>>> >                <xsl:with-param name="end" select="$end+1000"/>
>>> >            </xsl:call-template>
>>> >        </xsl:if>
>>> >    </xsl:template>
>>> >    <xsl:template match="*:species" mode="stream"/>
>>> > </xsl:stylesheet>
>>> >
>>> >
>>> > Here, the issue was with the template batch-animal-species . Saxon
>>> > Throws Error :
>>> >
>>> > e:\perf\xslt3>java  -jar saxon9ee.jar   str.xml splitter.x
>>> > sl  -o:StreamAni.xml
>>> > Static error at xsl:template on line 22 column 91 of splitter.xsl:
>>> >  XTSE3430: Template rule is declared streamable but it does not
>>> > satisfy the streamability rules.
>>> >  * Operand . of CallTemplate#batch-animal-species selects streamed
>>> nodes in a
>>> > context
>>> >  that allows arbitrary navigation (line 43)
>>> > Errors were reported during stylesheet compilation
>>> >
>>> >
>>> > I know that the logic for chunking various batched files could be made
>>> > better or even questionable.. But I was not expecting that the
>>> > Call-Template will fail.
>>> >
>>> > I am hoping some ninja warriors of XSLT3 can help me with this issue//
>>> > Seriously can not take No for an answer :) a lot is dependent on this
>>> > ...
>>> >
>>> > Also, if someone can think of an intelligent way for me to get this
>>> > done with a smarter code, and possibly without using fork( there is a
>>> > admin sitting somewhere in the System who has asked us to create code
>>> > without the multiple threads. He wants to be responsible for the
>>> > number of threads and discourages people from spawning multiple
>>> > threads. If not possible, then I will enforce that forking has to be
>>> > done.)
>>> > Please help ...
>>> > Dak.Tap

Current Thread