Re: [xsl] Design of XML so that it may be efficiently stream-processed

Subject: Re: [xsl] Design of XML so that it may be efficiently stream-processed
From: Richard Fozzard - NOAA Affiliate <richard.fozzard@xxxxxxxx>
Date: Wed, 27 Nov 2013 15:55:29 -0700
Are we talking about using xlink attributes in a primary XML document to point to a reusable chunk of external XML?

If so, this is an approach we are using in NOAA for complex ISO-19115 scientific metadata documents. Any XSLT that processes the primary XML can choose to 'resolve' the xlink (i.e. retrieve it from the xlink:href URL) -- or ignore it, if not relevant to the transform task at hand.

This approach has significantly reduced the size of our XML archives, and serves to modularize and simplify maintenance of the many thousands of XML files we host (or generate on demand).

But I've no idea how using xlink resolving might impact streaming?

Just my $0.02

Richard Fozzard, Computer Scientist
  Geospatial Metadata at NGDC:

Cooperative Institute for Research in Environmental Sciences (CIRES)
Univ. Colorado & NOAA National Geophysical Data Center, Enterprise Data Systems 
325 S. Broadway, Skaggs 1B-305, Boulder, CO 80305
Office: 303-497-6487, Cell: 303-579-5615, Email: richard.fozzard@xxxxxxxx

Timothy W. Cook said the following on 11/27/2013 01:23 PM:
> On Wed, Nov 27, 2013 at 5:03 PM, Hank Ratzesberger <xml@xxxxxxxxxxxx> wrote:
>> Hi Tim,
> ...
>> Well, agreed, there may be diminishing returns on so many documents
>> sharing the same metadata,
>> in those cases, maybe the metadata could be a permanent URL to a
>> document rather than a
>> repetition of the same.  Processors could load the external document
>> as variable. AFAIK, that
>> does not violate any streaming principle.  If every document loads the
>> same external metadata,
>> then hopefully your processor or system will have cached copy.
>> Not so different than keeping a local copy of DTD files.
> Great.  Because this is the approach I am using in healthcare.
>> [possibly nothing to do with your issue...]
>> But in so many instances, this is the pattern that makes XML such a
>> good replacement for
>> binary / proprietary files because the document becomes
>> self-contained.  For example,
>> when I worked with a seismologist -- all the data is just time series
>> points of acceleration.
>> Only until you add the instrument, sensitivity/scale, geo-location,
>> can it be usefully
>> integrated with other records for the same event.
> Self-contained sounds good.  However, since an XML document can point
> to another document, such as a schema. Doesn't it make sense that the
> syntactic and semantic parameters are defined in one place?  I am
> "assuming" that there are many, many data files created from one
> instrument, sensitivity/scale, geo-location, etc. ???
> Thanks,
> Tim

Current Thread