Re: [xsl] Improving Performance of XSLT on large files

Subject: Re: [xsl] Improving Performance of XSLT on large files
From: "Michael Beddow" <mbnospam@xxxxxxxxxxx>
Date: Wed, 29 Aug 2001 21:54:27 +0100
On Wednesday, August 29, 2001 7:28 PM
Gary Cor wrote:

> Michael,
>
> Your comments where really quite interesting to me and I guess I am
not
> really sure what is suited to XML and what isn't
[...]
> It confuses me greatly at what
> point large amounts of records are ill suited to XML, can anyone give
a ball
> park figure on this??   Do I have to have a have a SQL server or
something
> producing XML and then apply style as a secondary process?  Do I have
to
> subset the XML files into separate files or is there someway I could
just
> let this text file grow and grow maybe even to 1GB and still use my
MSXML
> XLST processor on it.  What does an experts think is the way to
determine an
> upper limit for an XML file and how can I determine this for both my
> projects or deal with a situation were the file grows too large.  I
hope
> someone can point me in the right direction


It looks as though you have two separate issues here.
1) Efficient retrieval of fragments from a large XML document base
2) XSLT transformation of large documents

Your "slowness" problem may not necessarily be an XSLT issue, in that
the time may be taken up in selecting the nodes that interest you,
rather than transforming them for output. (OK if you're using XPath
expressions in an XSLT processor to do the selecting, then it *is* an
XSLT issue, but selection of a relatively small nodeset from a
relatively large imput tree is not necessarily something XSLT is very
efficient at.) I suggest you take a look at Ron Bourret's various papers
on XML and databases to get an overview of the various solutions
available in this area. They're at
http://www.rpbourret.com/xml/index.htm

There's another listing of stuff you would probably find relevant at
http://www.xmldb.org/resources.html#articles_and_papers

My own view (based mainly on working with dictionary data) is that it
doesn't make sense to use XSLT to select a small subset of your data,
especially if you're working in real time. Instead, you should keep XSLT
just to transform/style the chunks of your data you actually want to
serve up to your clients, after using more efficient methods of
retrieving them. Such methods can involve a middle layer that provides
an XML wrapper over an RDMBS or OODMBS, or there are various ways in
which you can use a hashing system like BerkeleyDB to provide indexed
retrieval into a "pure" XML data store. Liam Quin's "Open Source XML
Database Toolkit" (Wiley) outlines a number of approaches of this kind.

Michael
---------------------------------------------------------
Michael Beddow   http://www.mbeddow.net/
XML and the Humanities page:  http://xml.lexilog.org.uk/
---------------------------------------------------------



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread