Re: [xsl] slow xsltproc XInclude processing w/complex document?

Subject: Re: [xsl] slow xsltproc XInclude processing w/complex document?
From: "M. David Peterson" <m.david@xxxxxxxxxx>
Date: Tue, 06 Jul 2004 14:20:40 -0600
Ill give you a quick hint to help clean up your process.. I wish I had time to look deeper into this for you but time is not something I have a lot of at the moment... None the less the first thing I noticed was the simple fact that you are running a for-each loop, applying each result element from the XPath of the select attribute of your for each loop to the apply-templates which is the built in mechanism for recursion in XSLT... so to simplify you are going through the "for-each" process twice. While I am unsure as to the exact process implemented by xsltproc (not sure if it will notice the obvious doule up and ignore the for-each choosing instead to apply the result of the xpath expression directly to the apply-templates process) I can tell you there is a possibility that you are processing through your massive XML file which is implied by simply using <xsl:apply-templates select="document(@href)"/>... and now that I just saw what you are using as your select attribute value I can definitely see why the compiler may not optimize ;)

I would definitely pull out that for-each and see how much that helps... As I'm looking further down I can definitely see some other areas that could be optimized... I wish I had the time to help further but I gotta get back to coding myself... None the less there are plenty of others here that will be more than happy to help you further...

Best of luck!

<M:D/>
:: Saxon.NET is now available to early beta participants! Visit http://www.x2x2x.org/x2x2x/home to sign up ::


Paul DuBois wrote:

I've been running some tests on a document that includes nested
Xinclude directives. The document is complex: upwards of 1500 files,
nested to a depth of up to 4 levels. Total size of content is about 4.8MB.


For simple testing, I'm attempting only to produce a "flattened"
document that just resolves the XIincludes.  Stylesheet looks like
this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>


<!-- Identity transform, but "flatten" xincludes -->

<xsl:output method="xml" indent="yes"/>
<xsl:preserve-space elements="*"/>

<xsl:template match="xi:include" xmlns:xi="http://www.w3.org/2001/XInclude";>
<xsl:for-each select="document(@href)">
<xsl:apply-templates/>
</xsl:for-each>
</xsl:template>


<!-- identity transform -->

<xsl:template match="/ | node() | @* | comment() | processing-instruction()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>


</xsl:stylesheet>


My processing command is:


xsltproc --xinclude --novalid xinclude.xsl input.xml > output.xml

This takes about 12 minutes on my 900 MHz G3 iBook (Mac OS X), and about
4 minutes on my 2.8 GHz Pentium 4 Gentoo Linux box.

That seems pretty slow, particular given that the control condition takes
mere seconds (running the flattenedly document through a standard identity
transform with xsltproc).


I don't want to post the input here because it's so big, so this is really
just a preliminary post to ask for advice as to how I might go about
improving the XInclude-d transform: Is this a known issue with
xsltproc/XInclude? Or is there perhaps some flag I should be using that I
am failing to use? Something bad about my stylesheet?


--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
--+--


Current Thread