Subject: Re: [xsl] How to efficiently obtain the first 10 records of a file with over 2 million records? From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 19 Jul 2023 21:25:41 -0000 |
And there are data structures, such as the Finger Tree (of course, not XML-based) that guarantee O(log(N)) access when searching by key or by position. Thus searching among 100 Billions of items in a Finger tree will be as fast as the average linear search in a sequence of 66 items. On Wed, Jul 19, 2023 at 1:32b/PM Dimitre Novatchev dnovatchev@xxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > I have an XML file containing over 2 million <record> elements. I want > to obtain the first 10 <record> elements. > > In general, there is no guarantee for achieving fast processing, unless > there has been some initial / additional preparation. > > Imagine that "the first 10 <record> elements" happen to be the last ten > elements of the possibly million elements of the XML document. > > Whenever people have huge data and they intend to retrieve and process > small pieces of it, then the usual solutions are: > > a). Find meaningful "sub-structures" in the data and based on this > split it into a multitude of smaller pieces of data, each containing a > manageable number of such structures. For example, there are 100 Billion > stars in the MWG. It would make sense instead of having one enormous XML > document containing the data for all of them. to create several smaller > documents, say for each spiral branch of the Galaxy. And indeed, the > largest collection of such data today (just about 1% of all the stars in > the Galaxy), produced by Gaia, comprises of multiple compressed files, not > a single one. Selecting with which of these files to work is similar to > orienting your telescope within a particular angle within the Galactic > plane, or choosing a particular telescope type that has the desired > technical characteristics. > b). Create an (different) index (and I believe that at least some > XQuery implementations do that) for every important, imaginable search > > Using b). above one could specify some complete processing, say starting > with XQuery (using an existing index) and when the wanted elements have > been retrieved almost instantaneously, call the standard XPath 3.1 > fn:transform() for further processing. > > If one doesn't know what kind of searches/processing they are going to > perform, this most probably means that they don't have defined any > use-cases, compelling enough to justify the huge document creation, in the > first place. > > Thanks, > Dimitre > > > On Wed, Jul 19, 2023 at 8:15b/AM Roger L Costello costello@xxxxxxxxx < > xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > >> Hi Folks, >> >> I have an XML file containing over 2 million <record> elements. I want to >> obtain the first 10 <record> elements. >> >> Here's how I did it: >> >> <xsl:for-each select="/Document/record[position() le 10]"> >> <xsl:sequence select="."/> >> </xsl:for-each> >> >> I ran it and it took a long time to complete. I am guessing that the XSLT >> processor is iterating over all 2 million <record> elements. Yes? How to >> write the XSLT code so that the XSLT processor stops iterating upon >> processing the first 10 <record> elements? >> >> /Roger >> >> >> > > > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/782854> (by > email <>) > -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- Never fight an inanimate object ------------------------------------- To avoid situations in which you might make mistakes may be the biggest mistake of all ------------------------------------ Quality means doing it right when no one is looking. ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ------------------------------------- To achieve the impossible dream, try going to sleep. ------------------------------------- Facts do not cease to exist because they are ignored. ------------------------------------- Typing monkeys will write all Shakespeare's works in 200yrs.Will they write all patents, too? :) ------------------------------------- Sanity is madness put to good use. ------------------------------------- I finally figured out the only reason to be alive is to enjoy it.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] How to efficiently obtain, Dimitre Novatchev dn | Thread | |
Re: [xsl] How to efficiently obtain, Dimitre Novatchev dn | Date | Re: [xsl] How to efficiently obtain, C. M. Sperberg-McQue |
Month |