Subject: Re: [xsl] slow xsltproc XInclude processing w/complex document? From: Paul DuBois <paul@xxxxxxxxxxxx> Date: Wed, 7 Jul 2004 13:10:55 -0500 |
I've been running some tests on a document that includes nested Xinclude directives. The document is complex: upwards of 1500 files, nested to a depth of up to 4 levels. Total size of content is about 4.8MB.
For simple testing, I'm attempting only to produce a "flattened" document that just resolves the XIincludes. Stylesheet looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Identity transform, but "flatten" xincludes -->
<xsl:output method="xml" indent="yes"/> <xsl:preserve-space elements="*"/>
<xsl:template match="xi:include" xmlns:xi="http://www.w3.org/2001/XInclude"> <xsl:for-each select="document(@href)"> <xsl:apply-templates/> </xsl:for-each> </xsl:template>
<!-- identity transform -->
<xsl:template match="/ | node() | @* | comment() | processing-instruction()"> <xsl:copy> <xsl:apply-templates select="@* | node()"/> </xsl:copy> </xsl:template>
</xsl:stylesheet>
A couple of folks suggested some improvements to the stylesheet, mostly aimed at eliminating loops and unnecessary node visits. Thanks all. The suggestions, however, made no difference at all. (The resulting execution times were reliable to within a few seconds to my original attempts.)
One thing I notice while watching xsltproc more closely is that the size of the output file remains zero for a long time and then, BOOM!!!, I get 4.8 MB on disk in a couple of seconds. During the time before xsltproc writes anything, I see its memory use slowly climb. (Depending on machine, it end up getting to about 30-50MB.
My two test machines have 640MB and 1GB RAM, so I don't think that's an issue. Given that xsltproc can execute an identity transform on the flattened file in a few seconds, a very uneducated guess is that it is simply much less efficient at constructing the document from fragments in XInclude files than when it can just read the entire document in as a stream.
I did some investigation into Jeni's suggestion of using a SAX-based transform to resolve the XIncludes. I think this could be workable: Use that transform as a front end to piping the resulting flattened document into xsltproc to perform other transforms.
I'm getting somewhat mixed results here. I discovered Matt Sergeant's XML::Filter::XInclude Perl module and tried that. At first, it didn't work at all; then I discovered that my input document was specifying a namespace of xmlns:xi="http://www.w3.org/2003/XInclude" and the module wants to see xmlns:xi="http://www.w3.org/2001/XInclude" instead.
(Digression: I think I'm confused about which namespace URI to use here. http://www.w3.org/2003/XInclude indicates that the 2003 form is deprecated and that the 2001 form should be used instead. On the other hand, the source code for libxml2 recognizes both, but refers to the 2001 form as the deprecated one. Hmm...)
<x><xi:include ... ></x> <x><xi:include ... ></x>
<xi:include ... > <xi:include ... >
A quick look at the module source convinced me that I don't understand what to patch to make it work. :-)
I also ran across a simple Perl XIncluder by Kip Hampton at: http://www.xml.com/pub/a/2001/10/10/sax-filters.html
This one shows some promise. I notice a few quirks here, as well, but perhaps I am on the road to success.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] slow xsltproc XInclude pr, Mike Trotman | Thread | Re: [xsl] slow xsltproc XInclude pr, J.Pietschmann |
Pageing, Evan Wellens | Date | Re: [xsl] XSL pattern needed for be, Mike Trotman |
Month |