Re: [xsl] Transforming large XML documents with XSLT 1.0

Subject: Re: [xsl] Transforming large XML documents with XSLT 1.0
From: "Mukul Gandhi gandhi.mukul@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 9 Apr 2019 09:24:59 -0000
Hi Martin,

On Tue, Apr 9, 2019 at 12:32 PM Martin Honnen martin.honnen@xxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Your subject and introduction seem to talk about a general approach to
> use XSLT 1 with very large documents but it seems the code you have in
> the repository assumes a certain XML document structure where your
> trigger elements as children of the root element are the only nodes the
> XSLT needs to transform, and all of them in an isolated way where the
> template for them does nothing but deal with the particular element and
> does not navigate to ancestors or siblings.
>

The code I've proposed to transform large XML documents, is not very
generic as the XSLT language itself. I think, we can handle different XML
vocabularies with this approach, by writing some XML transformation code in
the java code & some in the stylesheets (we need to combine serialization
in java along with serialization by the XSLT processor). I've currently
explored transformation via this approach, for XML vocabularies like
following,

<root>
    <element>
       ...
    </element>
    <element>
       ...
    </element>
    ...
</root>

The number of <element> nodes may be very large. As the StAX parser finds
the node <element>, the XSLT transformer will start transforming this
element, and serialization will occur to the output file. Transformation of
each <element> node writes the output to a common file, which is always
opened in append mode.

Another use case I've worked with is, splitting a large XML document using
StAX parser along with transformation APIs.


> wouldn't the same approach work using SAX?
>

I think, feeding SAX events from an XML input document to a XSLT
transformer cannot scale as the StAX parsing can do. SAX is a push API (the
parser will continuously push SAX events to an application), while StAX is
pull API (the application asks for the next event when it has processed an
earlier event).


> Also writing out the XML result's XML declaration and root element as
> bytes to a stream seems awkward, isn't there a way to chain your Stax
> StreamReader to a StreamWriter to simply write out XML with a dedicated
> API ensuring well-formedness and encoding?


This seems a nice idea. I'll try to explore it.


> In the .NET world you can do that with XmlReader/XmlWriter.
>

Ok. I'll try to explore that as well.





-- 
Regards,
Mukul Gandhi

Current Thread