Re: [xsl] Processing large XML Documents [> 50MB]

Subject: Re: [xsl] Processing large XML Documents [> 50MB]
From: Liam R E Quin <liam@xxxxxx>
Date: Mon, 22 Feb 2010 19:45:18 -0500
On Mon, 2010-02-22 at 15:40 -0800, Ramkumar Menon wrote:
> We have a need to process XML Documents that could be over 50 megs in size.

That's actually pretty small these days so you shouldn't have a problem.

> Due to the huge size of the document, XSLT is getting tough, with the
> environment we are running in.

I routinely run XSLT (usually Saxon) on documents that size; it does
often take a few seconds, though.

However, from your description of the problem, you have,
(1) assemble a document, processing all the input, once.
(2) update some parts of it

A possibility here is to use an XML database with some sort of
update facility -- e.g. MarkLogic, qizx, saxon-ee, basex, etc...
I've marked the implementations that (as far as I know) support
update, on http://www.w3.org/XML/Query/

In this case, you might be able to change your workflow to
(1) import XML into database once
(2) run updates as needed

And then of course...
(3) make lots of result documents based on queries of a subset of
    the document

I'd certainly consider XQuery for this.
There are implementations in C++, Java, OCAML, you name it :-)

When you are processing all of the input every time, XSLT is often
the best way to go; when you just want small parts of large documents,
XQuery is often a better choice.  This is as much down to the
implementations as anything else.

An alternative is to note the possibility of producing multiple
output documents with XSLT.

Without knowing more about your situation, though, it's hard to
give good advice... advice is free, good advice you pay for :-) :-)

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org

Current Thread