Re: [xsl] An efficient XSLT program that searches a large XML document for all occurrences of a string?

Subject: Re: [xsl] An efficient XSLT program that searches a large XML document for all occurrences of a string?
From: "Piez, Wendell A. (Fed) wendell.piez@xxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 3 May 2024 18:52:12 -0000
Hi,

Hm. I guess on reflection it may be such an argument, but that doesn't make it
a good one. At least in view of other observable limits.

Another counter argument is that although streaming might help to move the
line, it hasn't solved the problem of finite resources, it has only helped.

So a real counter might be that XSLT should not have streaming at all, but we
should have an alternative standard supporting a lightweight approach without
those limits (it would have others instead).

Maybe something like the Java SAX API? Or XQuery? If I had Roger's problem, I
might even be inclined to see how the XProc processors are handling this task,
these days.

Cheers, Wendell


From: Michael Kay mike@xxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Friday, May 3, 2024 11:47 AM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] An efficient XSLT program that searches a large XML
document for all occurrences of a string?

>Is this an argument that streaming should be in the core spec?

It depends what you are trying to achieve. There are too many good XSLT
processors that have fallen by the wayside because their implementors weren't
able to fund further development. You're not going to improve that situation
by making it even more costly to implement the spec.

Michael Kay
Saxonica


On 3 May 2024, at 16:36, Piez, Wendell A. (Fed)
wendell.piez@xxxxxxxx<mailto:wendell.piez@xxxxxxxx>
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list-service@xxxxxxxxxxxx
rytech.com>> wrote:

Mike and XSL-List,

Is this an argument that streaming should be in the core spec?

Since this limit will always be there, even if it moves? (And in view of
observations on how the real world is not always as accommodating as we might
like.)

Agree also with Dmitry and Liam. Know your tools. Even if an XML parser can't
swallow it whole, there are ways.

Cheers, Wendell

From: Michael Kay michaelkay90@xxxxxxxxx<mailto:michaelkay90@xxxxxxxxx>
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list-service@xxxxxxxxxxxx
rytech.com>>
Sent: Friday, May 3, 2024 4:32 AM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [xsl] An efficient XSLT program that searches a large XML
document for all occurrences of a string?





On 3 May 2024, at 00:25, Dimitre Novatchev
dnovatchev@xxxxxxxxx<mailto:dnovatchev@xxxxxxxxx>
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list-service@xxxxxxxxxxxx
rytech.com>> wrote:


If I were related to any activity that collects and structures such large
quantities of data, I would envisage splitting and keeping this data into
smaller, manageable chunks, wherever possible.

That's a good recommendation, but it's a workaround for the fact that the
technology isn't as scalable as we would like.

If a system offers you the opportunity to get an XML report of all the
transactions occurring between two dates at a range of locations, then sooner
or later someone is going to submit a query that delivers a 5Gb report, and in
an ideal world, they wouldn't have to do things differently just because the
amount of data has exceeded some arbitrary threshold.

Growth in data size tends to creep up on you. The log files that we keep of
licenses issued to Saxon users are now much larger than we ever envisaged when
we started. You don't want to have to change the design just because things
have grown incrementally. We did change the design: we switched to one XML
file per year. But it would be nice if we weren't forced into that by
technology limitations.

Michael Kay
Saxonica

XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/3302254> (by
email)
XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email)

XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/3302254> (by
email<>)

Current Thread