Subject: Re: [xsl] XSLT3 - Streaming + Recursive File Output From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 12 Aug 2016 11:02:32 -0000 |
> On 12 Aug 2016, at 11:23, Mailing Lists Mail daktapaal@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > Dr. Kay. > Thank you for your explanation. This is my first ever streaming stylesheet and your explanations are very educational to me. I have some questions. > > In your point A, you said we can switch off the multi Threading in the result document. How do we do that? > You can switch off multi-threading globally as a configuration option e.g. from the command line --allow-multithreading:off (note two initial hyphens) Alternatively, write <xsl:result-document .... saxon:asynchronous="no" xmlns:saxon="http://saxon.sf.net/"> to switch it off for a specific xsl:result-document instruction. > In point B, foreach , you typed idiv .. should it be div ? is it a typo or is there a new operator called idiv > > Introduced in XPath 2.0, idiv does integer division. So elements 1 to 10000 have grouping key 0, 10001 to 20000 have grouping key 1, etc. > Point c. Changing initial unnamed template to streamable produced no results. No files generated. Also in the examples given in the spec i did not see any mode on the initial template > > We would need to see how you are invoking the transformation. Sorry, I now see there is an xsl:stream instruction inside the match="/" template, so presumably you are supplying a dummy source document, which of course doesn't need to be streamed. Normally I use a named template entry point for such stylesheets. XSLT 3.0 recognizes <xsl:template name="xsl:initial-template"/>, and in Saxon you can then use -it (with no template name) to select this as the entry point, avoiding the need for a dummy source document. Michael Kay Saxonica > Thank you Michael for your insights .. i have learned a lot by asking the question. > > Dak > > > On Aug 11, 2016 7:13 PM, "Michael Kay mike@xxxxxxxxxxxx <mailto:mike@xxxxxxxxxxxx>" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx <mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote: > (A) don't equate xsl:fork with multi-threading. In fact, the current implementation of xsl:fork in Saxon is not multi-threaded (xsl:result-document might be, but you can switch it off). (Saxon's streamed processing uses a push model, which complicates many things, but pushing parser events to multiple consumers doesn't require multitple threads). > > (B) I think your recursive named template can be replaced with a streamable call on xsl:for-each-group, something like > > <xsl:for-each-group select="*:species" group-adjacent="(position()-1) idiv 1000"> > <xsl:result-document href="species{position()}.xml"> > <species><xsl:copy-of select="current-group()"/></species> > </xsl:result-document> > </xsl:for-each-group> > > Compared with your approach, this solution has the advantage of not imposing an arbitrary limit on the number of elements to be processed. > > (C) I would expect the initial unnamed mode should be streamable. > > (D) In the latest XSLT 3.0 we've provided "streamable stylesheet functions" - not yet implemented in Saxon - but we stopped short at streamable named templates. But you couldn't do this kind of batching using streamable stylesheet functions either. A human reader can see in your code that the Nth recursive call of the template is always processing nodes that are later in document order than the (N-1)th recursive call, but it would require a phenomenal amount of analysis for a theorem-prover to establish that during static analysis, and even if you could prove it streamable, generating a streamable execution plan would be far from trivial. > > Michael Kay > Saxonica > > > > On 11 Aug 2016, at 23:07, Mailing Lists Mail daktapaal@xxxxxxxxx <mailto:daktapaal@xxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx <mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote: > > > > Dear All, > > I have the following problem to solve using XSLT3 Streaming , which I > > have been trying for some time now and i find a road block no matter > > which way I choose. Seems to be an interesting issue to solve, which > > when resolved, will be a very good learning for me. > > > > I have a HUGE XML ( obviously a starting point for XSlt3 Streaming) > > > > I am using : SaxonEE9-7-0-7J > > > > Problem Definition > > > > 1. Remove a set of nodes(Species) from the source > > tree(UniverseKingdom.xml), which can be around 1000,000 > > 2. Create a File called UniverseKingdom-without-species.xml which has > > every element in UniverseKingdom, except the Species nodes > > 3. Create batches of 1000 species and throw them out into > > AnimalKingdomSpeciesBatch1.xml and so on and so forth till all the > > Species are covered. > > > > So when the Program runs, I get > > 1. UniverseKingdom-without-species.xml and 1000 files , each with > > 1000 Species, with appropriate file names > > AnimalKingdomSpeciesBatch1.xml ... to > > AnimalKingdomSpeciesBatch1000.xml > > > > What I did so far ( after many attempts and which I thought should > > work but did not work ) > > <xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform <http://www.w3.org/1999/XSL/Transform>" > > xmlns:xs="http://www.w3.org/2001/XMLSchema <http://www.w3.org/2001/XMLSchema>"> > > <xsl:mode name="stream" streamable="yes" on-no-match="shallow-copy"/> > > <xsl:strip-space elements="*"/> > > <xsl:output method="xml" indent="yes"/> > > <xsl:template match="/"> > > <xsl:result-document href="output\UniverseKingdom-without-species.xml"> > > <xsl:stream href="UniverseKingdom.xml"> > > <xsl:fork> > > <xsl:sequence> > > <xsl:apply-templates mode="stream"/> > > </xsl:sequence> > > <xsl:sequence> > > <xsl:for-each > > select="*:UniverseKingdom/*:AnimalKingdom"> > > <!-- Call Recursive Templates here --> > > <xsl:call-templates name="batch-animal-species"/> > > </xsl:for-each> > > </xsl:sequence> > > </xsl:fork> > > </xsl:stream> > > </xsl:result-document> > > </xsl:template> > > <xsl:template name="batch-animal-species"> > > <xsl:param name="limit" select="1000000"/> > > <xsl:param name="batch" select="1"/> > > <xsl:param name="start" select="1"/> > > <xsl:param name="end" select="1000"/> > > <xsl:if test="$start <= $limit "> > > <xsl:result-document > > href="output\AnimalKingdomSpeciesBatch{$batch}-.xml"> > > <species> > > <xsl:for-each select="*:species[position() = > > ($start to $end) ]"> > > <species> > > <xsl:copy-of select="."/> > > </species> > > </xsl:for-each> > > </species> > > </xsl:result-document> > > <xsl:call-template name="batch-animal-species"> > > <xsl:with-param name="batch" select="$batch+1"/> > > <xsl:with-param name="start" select="$end+1"/> > > <xsl:with-param name="end" select="$end+1000"/> > > </xsl:call-template> > > </xsl:if> > > </xsl:template> > > <xsl:template match="*:species" mode="stream"/> > > </xsl:stylesheet> > > > > > > Here, the issue was with the template batch-animal-species . Saxon > > Throws Error : > > > > e:\perf\xslt3>java -jar saxon9ee.jar str.xml splitter.x > > sl -o:StreamAni.xml > > Static error at xsl:template on line 22 column 91 of splitter.xsl: > > XTSE3430: Template rule is declared streamable but it does not > > satisfy the streamability rules. > > * Operand . of CallTemplate#batch-animal-species selects streamed nodes in a > > context > > that allows arbitrary navigation (line 43) > > Errors were reported during stylesheet compilation > > > > > > I know that the logic for chunking various batched files could be made > > better or even questionable.. But I was not expecting that the > > Call-Template will fail. > > > > I am hoping some ninja warriors of XSLT3 can help me with this issue// > > Seriously can not take No for an answer :) a lot is dependent on this > > ... > > > > Also, if someone can think of an intelligent way for me to get this > > done with a smarter code, and possibly without using fork( there is a > > admin sitting somewhere in the System who has asked us to create code > > without the multiple threads. He wants to be responsible for the > > number of threads and discourages people from spawning multiple > > threads. If not possible, then I will enforce that forking has to be > > done.) > > Please help ... > > Dak.Tap > > > > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <-list/293509> (by email <>)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XSLT3 - Streaming + Recur, Martin Honnen martin | Thread | Re: [xsl] XSLT3 - Streaming + Recur, Mailing Lists Mail d |
Re: [xsl] XSLT3 - Streaming + Recur, Martin Honnen martin | Date | Re: [xsl] XSLT3 - Streaming + Recur, Mailing Lists Mail d |
Month |