Re: [xsl] How to efficiently obtain the first 10 records of a file with over 2 million records?

Subject: Re: [xsl] How to efficiently obtain the first 10 records of a file with over 2 million records?
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 19 Jul 2023 20:31:58 -0000
>  I have an XML file containing over 2 million <record> elements. I want
to obtain the first 10 <record> elements.

In general, there is no guarantee for achieving fast processing, unless
there has been some initial / additional preparation.

Imagine that "the first 10 <record> elements" happen to be the last ten
elements of the possibly million elements of the XML document.

Whenever people have huge data and they intend to retrieve and process
small pieces of it,  then the usual solutions are:

   a). Find meaningful "sub-structures" in the data and based on this split
it into a multitude of smaller pieces of data, each containing a manageable
number of such structures. For example, there are 100 Billion stars in the
MWG. It would make sense instead of having one enormous XML document
containing the data for all of them. to create several smaller documents,
say for each spiral branch of the Galaxy. And indeed, the largest
collection of such data today (just about 1% of all the stars in the
Galaxy), produced by Gaia, comprises of multiple compressed files, not a
single one. Selecting with which of these files to work is similar to
orienting your telescope within a particular angle within the Galactic
plane, or choosing a particular telescope type that has the desired
technical characteristics.
   b). Create an (different) index (and I believe that at least some XQuery
implementations do that) for every important, imaginable search

Using b). above one could specify some complete processing, say starting
with XQuery (using an existing index) and when the wanted elements have
been retrieved almost instantaneously, call the standard XPath 3.1
fn:transform() for further processing.

If one doesn't know what kind of searches/processing they are going to
perform, this most probably means that they don't have defined any
use-cases, compelling enough to justify the huge document creation, in the
first place.

Thanks,
Dimitre


On Wed, Jul 19, 2023 at 8:15b/AM Roger L Costello costello@xxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Hi Folks,
>
> I have an XML file containing over 2 million <record> elements. I want to
> obtain the first 10 <record> elements.
>
> Here's how I did it:
>
> <xsl:for-each select="/Document/record[position() le 10]">
>     <xsl:sequence select="."/>
> </xsl:for-each>
>
> I ran it and it took a long time to complete. I am guessing that the XSLT
> processor is iterating over all 2 million <record> elements. Yes?  How to
> write the XSLT code so that the XSLT processor stops iterating upon
> processing the first 10 <record> elements?
>
> /Roger

Current Thread