Subject: Re: [xsl] XSLT3 - Streaming + Recursive File Output From: "Mailing Lists Mail daktapaal@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 12 Aug 2016 15:28:13 -0000 |
Thanks all. This has been a great learning experience On Aug 12, 2016 8:06 AM, "Mailing Lists Mail" <daktapaal@xxxxxxxxx> wrote: > idiv as in integer division > > On Aug 12, 2016 6:22 AM, "Mailing Lists Mail" <daktapaal@xxxxxxxxx> wrote: > >> Dr. Kay. >> Thank you for your explanation. This is my first ever streaming >> stylesheet and your explanations are very educational to me. I have some >> questions. >> In your point A, you said we can switch off the multi Threading in the >> result document. How do we do that? >> In point B, foreach , you typed idiv .. should it be div ? is it a typo >> or is there a new operator called idiv >> >> Point c. Changing initial unnamed template to streamable produced no >> results. No files generated. Also in the examples given in the spec i did >> not see any mode on the initial template >> >> Thank you Michael for your insights .. i have learned a lot by asking the >> question. >> >> Dak >> >> On Aug 11, 2016 7:13 PM, "Michael Kay mike@xxxxxxxxxxxx" < >> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: >> >>> (A) don't equate xsl:fork with multi-threading. In fact, the current >>> implementation of xsl:fork in Saxon is not multi-threaded >>> (xsl:result-document might be, but you can switch it off). (Saxon's >>> streamed processing uses a push model, which complicates many things, but >>> pushing parser events to multiple consumers doesn't require multitple >>> threads). >>> >>> (B) I think your recursive named template can be replaced with a >>> streamable call on xsl:for-each-group, something like >>> >>> <xsl:for-each-group select="*:species" group-adjacent="(position()-1) >>> idiv 1000"> >>> <xsl:result-document href="species{position()}.xml"> >>> <species><xsl:copy-of select="current-group()"/></species> >>> </xsl:result-document> >>> </xsl:for-each-group> >>> >>> Compared with your approach, this solution has the advantage of not >>> imposing an arbitrary limit on the number of elements to be processed. >>> >>> (C) I would expect the initial unnamed mode should be streamable. >>> >>> (D) In the latest XSLT 3.0 we've provided "streamable stylesheet >>> functions" - not yet implemented in Saxon - but we stopped short at >>> streamable named templates. But you couldn't do this kind of batching using >>> streamable stylesheet functions either. A human reader can see in your code >>> that the Nth recursive call of the template is always processing nodes that >>> are later in document order than the (N-1)th recursive call, but it would >>> require a phenomenal amount of analysis for a theorem-prover to establish >>> that during static analysis, and even if you could prove it streamable, >>> generating a streamable execution plan would be far from trivial. >>> >>> Michael Kay >>> Saxonica >>> >>> >>> > On 11 Aug 2016, at 23:07, Mailing Lists Mail daktapaal@xxxxxxxxx < >>> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: >>> > >>> > Dear All, >>> > I have the following problem to solve using XSLT3 Streaming , which I >>> > have been trying for some time now and i find a road block no matter >>> > which way I choose. Seems to be an interesting issue to solve, which >>> > when resolved, will be a very good learning for me. >>> > >>> > I have a HUGE XML ( obviously a starting point for XSlt3 Streaming) >>> > >>> > I am using : SaxonEE9-7-0-7J >>> > >>> > Problem Definition >>> > >>> > 1. Remove a set of nodes(Species) from the source >>> > tree(UniverseKingdom.xml), which can be around 1000,000 >>> > 2. Create a File called UniverseKingdom-without-species.xml which has >>> > every element in UniverseKingdom, except the Species nodes >>> > 3. Create batches of 1000 species and throw them out into >>> > AnimalKingdomSpeciesBatch1.xml and so on and so forth till all the >>> > Species are covered. >>> > >>> > So when the Program runs, I get >>> > 1. UniverseKingdom-without-species.xml and 1000 files , each with >>> > 1000 Species, with appropriate file names >>> > AnimalKingdomSpeciesBatch1.xml ... to >>> > AnimalKingdomSpeciesBatch1000.xml >>> > >>> > What I did so far ( after many attempts and which I thought should >>> > work but did not work ) >>> > <xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1 >>> 999/XSL/Transform" >>> > xmlns:xs="http://www.w3.org/2001/XMLSchema"> >>> > <xsl:mode name="stream" streamable="yes" >>> on-no-match="shallow-copy"/> >>> > <xsl:strip-space elements="*"/> >>> > <xsl:output method="xml" indent="yes"/> >>> > <xsl:template match="/"> >>> > <xsl:result-document href="output\UniverseKingdom-w >>> ithout-species.xml"> >>> > <xsl:stream href="UniverseKingdom.xml"> >>> > <xsl:fork> >>> > <xsl:sequence> >>> > <xsl:apply-templates mode="stream"/> >>> > </xsl:sequence> >>> > <xsl:sequence> >>> > <xsl:for-each >>> > select="*:UniverseKingdom/*:AnimalKingdom"> >>> > <!-- Call Recursive Templates here --> >>> > <xsl:call-templates >>> name="batch-animal-species"/> >>> > </xsl:for-each> >>> > </xsl:sequence> >>> > </xsl:fork> >>> > </xsl:stream> >>> > </xsl:result-document> >>> > </xsl:template> >>> > <xsl:template name="batch-animal-species"> >>> > <xsl:param name="limit" select="1000000"/> >>> > <xsl:param name="batch" select="1"/> >>> > <xsl:param name="start" select="1"/> >>> > <xsl:param name="end" select="1000"/> >>> > <xsl:if test="$start <= $limit "> >>> > <xsl:result-document >>> > href="output\AnimalKingdomSpeciesBatch{$batch}-.xml"> >>> > <species> >>> > <xsl:for-each select="*:species[position() = >>> > ($start to $end) ]"> >>> > <species> >>> > <xsl:copy-of select="."/> >>> > </species> >>> > </xsl:for-each> >>> > </species> >>> > </xsl:result-document> >>> > <xsl:call-template name="batch-animal-species"> >>> > <xsl:with-param name="batch" select="$batch+1"/> >>> > <xsl:with-param name="start" select="$end+1"/> >>> > <xsl:with-param name="end" select="$end+1000"/> >>> > </xsl:call-template> >>> > </xsl:if> >>> > </xsl:template> >>> > <xsl:template match="*:species" mode="stream"/> >>> > </xsl:stylesheet> >>> > >>> > >>> > Here, the issue was with the template batch-animal-species . Saxon >>> > Throws Error : >>> > >>> > e:\perf\xslt3>java -jar saxon9ee.jar str.xml splitter.x >>> > sl -o:StreamAni.xml >>> > Static error at xsl:template on line 22 column 91 of splitter.xsl: >>> > XTSE3430: Template rule is declared streamable but it does not >>> > satisfy the streamability rules. >>> > * Operand . of CallTemplate#batch-animal-species selects streamed >>> nodes in a >>> > context >>> > that allows arbitrary navigation (line 43) >>> > Errors were reported during stylesheet compilation >>> > >>> > >>> > I know that the logic for chunking various batched files could be made >>> > better or even questionable.. But I was not expecting that the >>> > Call-Template will fail. >>> > >>> > I am hoping some ninja warriors of XSLT3 can help me with this issue// >>> > Seriously can not take No for an answer :) a lot is dependent on this >>> > ... >>> > >>> > Also, if someone can think of an intelligent way for me to get this >>> > done with a smarter code, and possibly without using fork( there is a >>> > admin sitting somewhere in the System who has asked us to create code >>> > without the multiple threads. He wants to be responsible for the >>> > number of threads and discourages people from spawning multiple >>> > threads. If not possible, then I will enforce that forking has to be >>> > done.) >>> > Please help ... >>> > Dak.Tap
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XSLT3 - Streaming + Recur, Mailing Lists Mail d | Thread | Re: [xsl] XSLT3 - Streaming + Recur, Martin Honnen martin |
Re: [xsl] XSLT3 - Streaming + Recur, Mailing Lists Mail d | Date | Re: [xsl] XSLT3 - Streaming + Recur, Martin Honnen martin |
Month |