Subject: Re: [xsl] Streaming with XSLT version 3.0|
From: Radu Pisoi <radu_pisoi@xxxxxxx>
Date: Tue, 11 Mar 2014 16:33:36 +0200
Regards, Radu -- Radu Pisoi <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com
MIchael, I did run the process successfully. See my notes here. I have reported it to Oxygen. Details for running a large file with xslt v3 streaming ========== Large source file is found here: http://dumps.wikimedia.org/enwiki/20130403/enwiki-20130403-pages-articles-multistream.xml.bz2 ========== Here is the result of Saxon running for a DOS shell with a respectable 21 minutes and no out-of-memory report C:\Temp\wiki>C:\Progra~2\Java\jre7\bin\java -Xmx180m -Xss4096k -Xms48m -cp C:/saxon/saxon9ee.jar; net.sf.saxon.Transform -TJ -t -it:main -o:C:/Temp/wiki/out/wiki-03-output.xml C:/Temp/wiki/xsl/wiki-03.xsl Saxon-EE 22.214.171.124J from Saxonica Java version 1.7.0_45 Using license serial number V001638 Generating byte code... Stylesheet compilation time: 476 milliseconds Processing (no source document) initial template = main URIResolver.resolve href="../source/enwiki.xml" base="file:/C:/Temp/wiki/xsl/wiki-03.xsl" Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser Writing to file:/C:/Temp/wiki/out/output-wiki-03.xml Execution time: 21m 24.612s (1284612ms) Memory used: 25491272 NamePool contents: 28 entries in 27 chains. 7 URIs ========== With this xsl stylesheet <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.mediawiki.org/xml/export-0.8/" xpath-default-namespace="http://www.mediawiki.org/xml/export-0.8/" exclude-result-prefixes="#all" version="3.0"> <xsl:output method="xml"/> <xsl:variable name="root" select="/"/> <xsl:mode streamable="yes"/> <xsl:template name="main"> <xsl:stream href="../source/enwiki.xml"> <xsl:result-document href="../out/output-wiki-03.xml"> <count> <xsl:iterate select="mediawiki/page"> <xsl:param name="count" select="0" as="xs:decimal"/> <xsl:next-iteration> <xsl:with-param name="count" select="$count+1"/> </xsl:next-iteration> <xsl:on-completion> <xsl:value-of select="$count"/> </xsl:on-completion> </xsl:iterate> </count> </xsl:result-document> </xsl:stream> </xsl:template> </xsl:stylesheet> ============ With this result file <?xml version="1.0" encoding="UTF-8"?> <count xmlns="http://www.mediawiki.org/xml/export-0.8/%22%3E13355093%3C/count> ============ While running in Oxygen 15.2 with Saxon 126.96.36.199 with same source and stylesheet file after about an hour we had an out of memory error. I have reported it to Oxygen.
On Saturday, March 8, 2014 5:43 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote: Could you try it outside oXygen? You can get a 30-day free Saxon-EE evaluation license to enable this. That will establish whether the problem is primarily a Saxon one or an oXygen one, which will make it a lot easier to help you.
Michael Kay Saxonica
On 7 Mar 2014, at 23:10, Terry Badger <terry_badger@xxxxxxxxx> wrote:
David, Thank you. I tried your suggestion but it still failed with an out-of-memory report. Terry
On Friday, March 7, 2014 9:10 AM, David Rudel <fwqhgads@xxxxxxxxx> wrote: Terry, You can address the possibility that oXygen is simply choking on the output by wrapping your output in <xsl:result-document> instructions.
If you pipe output to a file, oXygen does not attempt to display it in the application when the scenario completes. This would eliminate at least one possible reason for the crash without requiring you to run from the command line.
On Fri, Mar 7, 2014 at 1:09 AM, Abel Braaksma (Exselt) <abel@xxxxxxxxxx> wrote:
It is also important to try to find out what is actually causing the memory exception. If you run it from oXygen like you say, it is very well possible that the exception comes from oXygen itself, not capable of handling the output file. This would explain the late memory exception. To find this out, simply run it from the command line, and what what happens to memory in task manager.
"A false conclusion, once arrived at and widely accepted is not dislodged easily, and the less it is understood, the more tenaciously it is held." - Cantor's Law of Preservation of Ignorance.
-- Regards, Radu
Radu Pisoi <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com