Re: [xsl] XSLT 3.0 streaming vs other big-data technologies

Subject: Re: [xsl] XSLT 3.0 streaming vs other big-data technologies
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 13 Jun 2018 07:30:02 -0000
There's nothing in the language spec that constrains where the data comes
from.

In the Saxon (Java)  implementation, it can come from any Java InputStream.
Constructing an InputStream that reads from multiple storage nodes or an HDFS
file system is someone else's job, but I see no reason why it should be
difficult.

The Saxon implementation does have some limits that mean the input stream
can't be infinite: most obviously, the nodes are numbered using a 32-bit
integer. That one is easily fixed, but it's hard to verify that there aren't
others.

(More generally, I've been surprised that I've seen very little discussion
about how Java and C# cope with the 32-bit limit, e.g. on array indexing. The
Streams API seems part of the solution, but it certainly doesn't solve the
whole problem.)

Michael Kay
Saxonica

> On 13 Jun 2018, at 07:03, Mukul Gandhi gandhi.mukul@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi all,
>    Most of us might be knowing big-data technologies like Hadoop, HDFS etc.
With HDFS, I think a file can span multiple storage nodes (this potentially
allows real big data as input to run-time processes).
> Can XSLT 3.0 streaming, also accept big data of this kind as input to an
XSLT transform (i.e the input XML/text document spanning multiple storage
nodes)? Or by design, XSLT 3.0 can only have big XML file input, that can be
stored entirely on one storage node?
> Can we also say, that XSLT 3.0 can work over the HDFS file system, to allow
big-data spanning multiple storage nodes?
>
>
>
> --
> Regards,
> Mukul Gandhi
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <-list/293509> (by email <>)

Current Thread