Subject: Re: [xsl] Design of XML so that it may be efficiently stream-processed From: Michael Kay <mike@xxxxxxxxxxxx> Date: Fri, 22 Nov 2013 11:40:51 +0000 |
Firstly, I question the premise that XML should be designed to enable streamed transformation. One could equally well argue that you should design it so it doesn't need to be transformed at all. Transformation is only necessary because the data isn't in the form you want it; designing it so that it can easily be transformed into the form you want it seems a little odd. Unless perhaps you are thinking of designing the intermediate formats in a processing pipeline. > > 1. Use lots of attributes. Store in them the data needed for processing the node. Certainly for data that can conveniently be represented as attributes, this will make streamed processing easier. But don't overdo it. > > 2. Have one child element only. No, if there are two things that should naturally be represented as child elements, then represent them that way. There are plenty of techniques still available for streamed processing: accumulators, xsl:iterator, fold-left, xsl:fork. > > > So, to enable efficient stream processing, design XML like this: > > <root a="..." b="..." c="..."> > <node d="..." e="..." f="..."> > <node g="..." h="..." i="..."> > <node j="..." k="..." l="..."> > <node m="..." n="..." o="..."> > <node p="..." q="..." r="..."> > ... > </node> > </node> > </node> > </node> > </node> > </root> > > This results in a massively deep tree. For Gigabyte-sized XML files, the nesting could be a billion levels deep (or more). > No, such a design is completely bizarre and defeats the whole purpose of streaming, which is to reduce memory use. I would add some more important design criteria. Put metadata and reference information (stuff that's needed for reference throughout document processing) at the start of the document rather than the end, or in a separate document. Use hierarchic nesting for relationships rather than id/idref style pointers (even perhaps if it means holding the data redundantly). Michael Kay Saxonica
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Design of XML so that it , Piotr Bański | Thread | Re: [xsl] Design of XML so that it , Wolfgang Laun |
Re: [xsl] XSLT streaming: the proce, Michael Kay | Date | Re: [xsl] Design of XML so that it , Wolfgang Laun |
Month |