Re: [xsl] Improving Performance of XSLT on large files

Subject: Re: [xsl] Improving Performance of XSLT on large files
From: "Michael Beddow" <mbnospam@xxxxxxxxxxx>
Date: Wed, 29 Aug 2001 16:39:41 +0100
Perhaps I ought to leave this one to the CompSci folks, but here goes
anyway:

[..]
> Adding relevant spacer characters to any variable containers
> in the XML to ensure that the records in the XML repeat in
> a mathematically recognisable against character position
> throughout the file.
[..]

This looks to me as though you're imagining that the XSLT processor
operates like a serial filter on an input stream. But it doesn't. It
parses the entire input stream into an internal tree representation of
the data before it does anything else And that's a resource-intensive
thing to do. No doubt a clever processor could then detect repeating
patterns and use appropriate shortcuts to process them, but the full
tree representation has to be built first.

There are some Perl modules (e.g. XML::Twig) that try to address the
problems this creates for large files by allowing you to extract and
handle subtrees without a complete deserialisation of the entire file.
Maybe if your data is indeed so ill-matched to the way XSLT processors
normally proceed, you should check them out.

Michael
---------------------------------------------------------
Michael Beddow   http://www.mbeddow.net/
XML and the Humanities page:  http://xml.lexilog.org.uk/
---------------------------------------------------------
----- Original Message -----
From: "gary cor" <stuff4gary@xxxxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, August 29, 2001 12:26 PM
Subject: [xsl] Improving Performance of XSLT on large files


> Dear All,
>
> I have recently started working with XLST and cannot reason why it
needs to
> be so slow with a large XML file (i.e. 70MB +).  I will be trying to
process
> without ordering etc. later and will run a lot of tests to see if this
helps
> but it just seems massively too slow!!  Possibly, I don't understand
because
> my XML files are just like the infinitely complicated DNA structure
and DO
> always contain repeating substructures at many different levels as
well.
> So,
> would ensuring the following help actually me in any way??  :
>
> *That All elements, attributes, etc. are compulsory in long record
sets so
> they are totally repeating units
>
> *Then Adding relevant spacer characters to any variable containers in
> the XML to ensure that the records in the XML repeat in a
mathematically
> recognisable against character position throughout the file.
>
> Then possibly I could optimise this process for a DNA Validation!
And,
> apply a mathematical functions for the pointer so it knows
specifically
> where to read element data from the document without having even look
at any
> irrelevant bits??  Then feed only the relevant bits of XML into to the
XLST
> processor as a secondary process...  And, where do I and where don't I
get a
> performance advantage doing something like this?? And does anything do
this
> already?
>
> Any, comments would on this subject would be very much appreciated!
>
> Kind Regards
>
> Gary Cornelius
>
>
>
>
>
> _________________________________________________________________
> Get your FREE download of MSN Explorer at
http://explorer.msn.com/intl.asp
>
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread