Subject: Re: [xsl] XSLT3 - Streaming + Recursive File Output From: "Mailing Lists Mail daktapaal@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 12 Aug 2016 12:06:50 -0000 |
idiv as in integer division On Aug 12, 2016 6:22 AM, "Mailing Lists Mail" <daktapaal@xxxxxxxxx> wrote: > Dr. Kay. > Thank you for your explanation. This is my first ever streaming stylesheet > and your explanations are very educational to me. I have some questions. > In your point A, you said we can switch off the multi Threading in the > result document. How do we do that? > In point B, foreach , you typed idiv .. should it be div ? is it a typo or > is there a new operator called idiv > > Point c. Changing initial unnamed template to streamable produced no > results. No files generated. Also in the examples given in the spec i did > not see any mode on the initial template > > Thank you Michael for your insights .. i have learned a lot by asking the > question. > > Dak > > On Aug 11, 2016 7:13 PM, "Michael Kay mike@xxxxxxxxxxxx" < > xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > >> (A) don't equate xsl:fork with multi-threading. In fact, the current >> implementation of xsl:fork in Saxon is not multi-threaded >> (xsl:result-document might be, but you can switch it off). (Saxon's >> streamed processing uses a push model, which complicates many things, but >> pushing parser events to multiple consumers doesn't require multitple >> threads). >> >> (B) I think your recursive named template can be replaced with a >> streamable call on xsl:for-each-group, something like >> >> <xsl:for-each-group select="*:species" group-adjacent="(position()-1) >> idiv 1000"> >> <xsl:result-document href="species{position()}.xml"> >> <species><xsl:copy-of select="current-group()"/></species> >> </xsl:result-document> >> </xsl:for-each-group> >> >> Compared with your approach, this solution has the advantage of not >> imposing an arbitrary limit on the number of elements to be processed. >> >> (C) I would expect the initial unnamed mode should be streamable. >> >> (D) In the latest XSLT 3.0 we've provided "streamable stylesheet >> functions" - not yet implemented in Saxon - but we stopped short at >> streamable named templates. But you couldn't do this kind of batching using >> streamable stylesheet functions either. A human reader can see in your code >> that the Nth recursive call of the template is always processing nodes that >> are later in document order than the (N-1)th recursive call, but it would >> require a phenomenal amount of analysis for a theorem-prover to establish >> that during static analysis, and even if you could prove it streamable, >> generating a streamable execution plan would be far from trivial. >> >> Michael Kay >> Saxonica >> >> >> > On 11 Aug 2016, at 23:07, Mailing Lists Mail daktapaal@xxxxxxxxx < >> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: >> > >> > Dear All, >> > I have the following problem to solve using XSLT3 Streaming , which I >> > have been trying for some time now and i find a road block no matter >> > which way I choose. Seems to be an interesting issue to solve, which >> > when resolved, will be a very good learning for me. >> > >> > I have a HUGE XML ( obviously a starting point for XSlt3 Streaming) >> > >> > I am using : SaxonEE9-7-0-7J >> > >> > Problem Definition >> > >> > 1. Remove a set of nodes(Species) from the source >> > tree(UniverseKingdom.xml), which can be around 1000,000 >> > 2. Create a File called UniverseKingdom-without-species.xml which has >> > every element in UniverseKingdom, except the Species nodes >> > 3. Create batches of 1000 species and throw them out into >> > AnimalKingdomSpeciesBatch1.xml and so on and so forth till all the >> > Species are covered. >> > >> > So when the Program runs, I get >> > 1. UniverseKingdom-without-species.xml and 1000 files , each with >> > 1000 Species, with appropriate file names >> > AnimalKingdomSpeciesBatch1.xml ... to >> > AnimalKingdomSpeciesBatch1000.xml >> > >> > What I did so far ( after many attempts and which I thought should >> > work but did not work ) >> > <xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1 >> 999/XSL/Transform" >> > xmlns:xs="http://www.w3.org/2001/XMLSchema"> >> > <xsl:mode name="stream" streamable="yes" on-no-match="shallow-copy"/> >> > <xsl:strip-space elements="*"/> >> > <xsl:output method="xml" indent="yes"/> >> > <xsl:template match="/"> >> > <xsl:result-document href="output\UniverseKingdom-w >> ithout-species.xml"> >> > <xsl:stream href="UniverseKingdom.xml"> >> > <xsl:fork> >> > <xsl:sequence> >> > <xsl:apply-templates mode="stream"/> >> > </xsl:sequence> >> > <xsl:sequence> >> > <xsl:for-each >> > select="*:UniverseKingdom/*:AnimalKingdom"> >> > <!-- Call Recursive Templates here --> >> > <xsl:call-templates >> name="batch-animal-species"/> >> > </xsl:for-each> >> > </xsl:sequence> >> > </xsl:fork> >> > </xsl:stream> >> > </xsl:result-document> >> > </xsl:template> >> > <xsl:template name="batch-animal-species"> >> > <xsl:param name="limit" select="1000000"/> >> > <xsl:param name="batch" select="1"/> >> > <xsl:param name="start" select="1"/> >> > <xsl:param name="end" select="1000"/> >> > <xsl:if test="$start <= $limit "> >> > <xsl:result-document >> > href="output\AnimalKingdomSpeciesBatch{$batch}-.xml"> >> > <species> >> > <xsl:for-each select="*:species[position() = >> > ($start to $end) ]"> >> > <species> >> > <xsl:copy-of select="."/> >> > </species> >> > </xsl:for-each> >> > </species> >> > </xsl:result-document> >> > <xsl:call-template name="batch-animal-species"> >> > <xsl:with-param name="batch" select="$batch+1"/> >> > <xsl:with-param name="start" select="$end+1"/> >> > <xsl:with-param name="end" select="$end+1000"/> >> > </xsl:call-template> >> > </xsl:if> >> > </xsl:template> >> > <xsl:template match="*:species" mode="stream"/> >> > </xsl:stylesheet> >> > >> > >> > Here, the issue was with the template batch-animal-species . Saxon >> > Throws Error : >> > >> > e:\perf\xslt3>java -jar saxon9ee.jar str.xml splitter.x >> > sl -o:StreamAni.xml >> > Static error at xsl:template on line 22 column 91 of splitter.xsl: >> > XTSE3430: Template rule is declared streamable but it does not >> > satisfy the streamability rules. >> > * Operand . of CallTemplate#batch-animal-species selects streamed >> nodes in a >> > context >> > that allows arbitrary navigation (line 43) >> > Errors were reported during stylesheet compilation >> > >> > >> > I know that the logic for chunking various batched files could be made >> > better or even questionable.. But I was not expecting that the >> > Call-Template will fail. >> > >> > I am hoping some ninja warriors of XSLT3 can help me with this issue// >> > Seriously can not take No for an answer :) a lot is dependent on this >> > ... >> > >> > Also, if someone can think of an intelligent way for me to get this >> > done with a smarter code, and possibly without using fork( there is a >> > admin sitting somewhere in the System who has asked us to create code >> > without the multiple threads. He wants to be responsible for the >> > number of threads and discourages people from spawning multiple >> > threads. If not possible, then I will enforce that forking has to be >> > done.) >> > Please help ... >> > Dak.Tap
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XSLT3 - Streaming + Recur, Michael Kay mike@xxx | Thread | Re: [xsl] XSLT3 - Streaming + Recur, Mailing Lists Mail d |
Re: [xsl] XSLT3 - Streaming + Recur, Michael Kay mike@xxx | Date | Re: [xsl] XSLT3 - Streaming + Recur, Mailing Lists Mail d |
Month |