Re: [xsl] How to efficiently obtain the first 10 records of a file with over 2 million records?

Subject: Re: [xsl] How to efficiently obtain the first 10 records of a file with over 2 million records?
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 19 Jul 2023 21:25:41 -0000
And there are data structures, such as the Finger  Tree (of course, not
XML-based) that guarantee O(log(N)) access when searching by key or by
position.  Thus searching among 100 Billions of items in a Finger tree will
be as fast as the average linear search in a sequence of 66 items.

On Wed, Jul 19, 2023 at 1:32b/PM Dimitre Novatchev dnovatchev@xxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> >  I have an XML file containing over 2 million <record> elements. I want
> to obtain the first 10 <record> elements.
>
> In general, there is no guarantee for achieving fast processing, unless
> there has been some initial / additional preparation.
>
> Imagine that "the first 10 <record> elements" happen to be the last ten
> elements of the possibly million elements of the XML document.
>
> Whenever people have huge data and they intend to retrieve and process
> small pieces of it,  then the usual solutions are:
>
>    a). Find meaningful "sub-structures" in the data and based on this
> split it into a multitude of smaller pieces of data, each containing a
> manageable number of such structures. For example, there are 100 Billion
> stars in the MWG. It would make sense instead of having one enormous XML
> document containing the data for all of them. to create several smaller
> documents, say for each spiral branch of the Galaxy. And indeed, the
> largest collection of such data today (just about 1% of all the stars in
> the Galaxy), produced by Gaia, comprises of multiple compressed files, not
> a single one. Selecting with which of these files to work is similar to
> orienting your telescope within a particular angle within the Galactic
> plane, or choosing a particular telescope type that has the desired
> technical characteristics.
>    b). Create an (different) index (and I believe that at least some
> XQuery implementations do that) for every important, imaginable search
>
> Using b). above one could specify some complete processing, say starting
> with XQuery (using an existing index) and when the wanted elements have
> been retrieved almost instantaneously, call the standard XPath 3.1
> fn:transform() for further processing.
>
> If one doesn't know what kind of searches/processing they are going to
> perform, this most probably means that they don't have defined any
> use-cases, compelling enough to justify the huge document creation, in the
> first place.
>
> Thanks,
> Dimitre
>
>
> On Wed, Jul 19, 2023 at 8:15b/AM Roger L Costello costello@xxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>> Hi Folks,
>>
>> I have an XML file containing over 2 million <record> elements. I want to
>> obtain the first 10 <record> elements.
>>
>> Here's how I did it:
>>
>> <xsl:for-each select="/Document/record[position() le 10]">
>>     <xsl:sequence select="."/>
>> </xsl:for-each>
>>
>> I ran it and it took a long time to complete. I am guessing that the XSLT
>> processor is iterating over all 2 million <record> elements. Yes?  How to
>> write the XSLT code so that the XSLT processor stops iterating upon
>> processing the first 10 <record> elements?
>>
>> /Roger
>>
>>
>>
>
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/782854> (by
> email <>)
>


--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they write
all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Current Thread