|
Subject: Re: [xsl] XSLT3 - Streaming + Recursive File Output From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 11 Aug 2016 23:13:16 -0000 |
(A) don't equate xsl:fork with multi-threading. In fact, the current
implementation of xsl:fork in Saxon is not multi-threaded (xsl:result-document
might be, but you can switch it off). (Saxon's streamed processing uses a push
model, which complicates many things, but pushing parser events to multiple
consumers doesn't require multitple threads).
(B) I think your recursive named template can be replaced with a streamable
call on xsl:for-each-group, something like
<xsl:for-each-group select="*:species" group-adjacent="(position()-1) idiv
1000">
<xsl:result-document href="species{position()}.xml">
<species><xsl:copy-of select="current-group()"/></species>
</xsl:result-document>
</xsl:for-each-group>
Compared with your approach, this solution has the advantage of not imposing
an arbitrary limit on the number of elements to be processed.
(C) I would expect the initial unnamed mode should be streamable.
(D) In the latest XSLT 3.0 we've provided "streamable stylesheet functions" -
not yet implemented in Saxon - but we stopped short at streamable named
templates. But you couldn't do this kind of batching using streamable
stylesheet functions either. A human reader can see in your code that the Nth
recursive call of the template is always processing nodes that are later in
document order than the (N-1)th recursive call, but it would require a
phenomenal amount of analysis for a theorem-prover to establish that during
static analysis, and even if you could prove it streamable, generating a
streamable execution plan would be far from trivial.
Michael Kay
Saxonica
> On 11 Aug 2016, at 23:07, Mailing Lists Mail daktapaal@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Dear All,
> I have the following problem to solve using XSLT3 Streaming , which I
> have been trying for some time now and i find a road block no matter
> which way I choose. Seems to be an interesting issue to solve, which
> when resolved, will be a very good learning for me.
>
> I have a HUGE XML ( obviously a starting point for XSlt3 Streaming)
>
> I am using : SaxonEE9-7-0-7J
>
> Problem Definition
>
> 1. Remove a set of nodes(Species) from the source
> tree(UniverseKingdom.xml), which can be around 1000,000
> 2. Create a File called UniverseKingdom-without-species.xml which has
> every element in UniverseKingdom, except the Species nodes
> 3. Create batches of 1000 species and throw them out into
> AnimalKingdomSpeciesBatch1.xml and so on and so forth till all the
> Species are covered.
>
> So when the Program runs, I get
> 1. UniverseKingdom-without-species.xml and 1000 files , each with
> 1000 Species, with appropriate file names
> AnimalKingdomSpeciesBatch1.xml ... to
> AnimalKingdomSpeciesBatch1000.xml
>
> What I did so far ( after many attempts and which I thought should
> work but did not work )
> <xsl:stylesheet version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> xmlns:xs="http://www.w3.org/2001/XMLSchema">
> <xsl:mode name="stream" streamable="yes" on-no-match="shallow-copy"/>
> <xsl:strip-space elements="*"/>
> <xsl:output method="xml" indent="yes"/>
> <xsl:template match="/">
> <xsl:result-document
href="output\UniverseKingdom-without-species.xml">
> <xsl:stream href="UniverseKingdom.xml">
> <xsl:fork>
> <xsl:sequence>
> <xsl:apply-templates mode="stream"/>
> </xsl:sequence>
> <xsl:sequence>
> <xsl:for-each
> select="*:UniverseKingdom/*:AnimalKingdom">
> <!-- Call Recursive Templates here -->
> <xsl:call-templates
name="batch-animal-species"/>
> </xsl:for-each>
> </xsl:sequence>
> </xsl:fork>
> </xsl:stream>
> </xsl:result-document>
> </xsl:template>
> <xsl:template name="batch-animal-species">
> <xsl:param name="limit" select="1000000"/>
> <xsl:param name="batch" select="1"/>
> <xsl:param name="start" select="1"/>
> <xsl:param name="end" select="1000"/>
> <xsl:if test="$start <= $limit ">
> <xsl:result-document
> href="output\AnimalKingdomSpeciesBatch{$batch}-.xml">
> <species>
> <xsl:for-each select="*:species[position() =
> ($start to $end) ]">
> <species>
> <xsl:copy-of select="."/>
> </species>
> </xsl:for-each>
> </species>
> </xsl:result-document>
> <xsl:call-template name="batch-animal-species">
> <xsl:with-param name="batch" select="$batch+1"/>
> <xsl:with-param name="start" select="$end+1"/>
> <xsl:with-param name="end" select="$end+1000"/>
> </xsl:call-template>
> </xsl:if>
> </xsl:template>
> <xsl:template match="*:species" mode="stream"/>
> </xsl:stylesheet>
>
>
> Here, the issue was with the template batch-animal-species . Saxon
> Throws Error :
>
> e:\perf\xslt3>java -jar saxon9ee.jar str.xml splitter.x
> sl -o:StreamAni.xml
> Static error at xsl:template on line 22 column 91 of splitter.xsl:
> XTSE3430: Template rule is declared streamable but it does not
> satisfy the streamability rules.
> * Operand . of CallTemplate#batch-animal-species selects streamed nodes in
a
> context
> that allows arbitrary navigation (line 43)
> Errors were reported during stylesheet compilation
>
>
> I know that the logic for chunking various batched files could be made
> better or even questionable.. But I was not expecting that the
> Call-Template will fail.
>
> I am hoping some ninja warriors of XSLT3 can help me with this issue//
> Seriously can not take No for an answer :) a lot is dependent on this
> ...
>
> Also, if someone can think of an intelligent way for me to get this
> done with a smarter code, and possibly without using fork( there is a
> admin sitting somewhere in the System who has asked us to create code
> without the multiple threads. He wants to be responsible for the
> number of threads and discourages people from spawning multiple
> threads. If not possible, then I will enforce that forking has to be
> done.)
> Please help ...
> Dak.Tap
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] XSLT3 - Streaming + Recur, David Rudel fwqhgads | Thread | Re: [xsl] XSLT3 - Streaming + Recur, Mailing Lists Mail d |
| Re: [xsl] XSLT3 - Streaming + Recur, David Rudel fwqhgads | Date | Re: [xsl] XSLT3 - Streaming + Recur, Mailing Lists Mail d |
| Month |