Re: [xsl] Find the number of elements that are prior to the series of elements that match a string?

Subject: Re: [xsl] Find the number of elements that are prior to the series of elements that match a string?
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 12 Mar 2019 21:46:28 -0000
On Tue, Mar 12, 2019 at 1:31 PM Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> In performance terms, of course, a solution using keys incurs a one-time cost to build indexes, which is amortized over a number of searches; so the performance benefits depend on whether the search is done once, or repeatedly.
> If there are "hundreds of thousands" of Byte elements, the space used by the index can become a significant factor.
> Also, a solution using keys will almost inevitably process the whole input sequence. Other solutions are likely to stop reading the input sequence when the first match is found.

Wouldn't "hundreds of thousands" of Byte elements be also a (both
performance and) space problem when using index-of() -- especially if
a starting subsequence of the searched string happens regularly in the
document? Also RegEx matching becomes not so efficient with large
strings, too.

It would be nice if a particular XSLT processor implements lazily the
index-of() function, allowing its partial results to be immediately
accessible to the evaluator of the whole XPath expression -- in this
case only, the solution using index-of() will stop immediately after
the successful index value is produced -- otherwise the evaluator will
need to perform on the complete index.

Of course, exactly the same reasoning applies to the implementation of
<xsl:key> and the key() function, if a smart enough XSLT processor
allows the key() function to operate on a partially-built index, and
when a solution is found the building of the index is halted.

So, **depending on implementation details**, the other solutions
**may** have a better average performance for a single search. With
multiple searches though, using keys will be more performant. Also,
depending on where the sought substring is located for the first time
in the document, if it isn't found at all or is at the end of the
sequence of Byte elements, then exhaustive string scanning will be
performed with these other solutions too.

If we accept that an "average" search will involve scanning half of
the string, then if 3 or more searches are performed, the key-based
solution will have incurred the cost of processing the document just
once, while the accumulated string scanning of the other solutions
will be over greater (accumulated) length of sequences of Byte

Maybe we need a performance comparison of the different solutions,
over different source XML documents that cover these cases?

> Michael Kay
> Saxonica
> On 12 Mar 2019, at 20:20, Dimitre Novatchev dnovatchev@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Here is an XSLT 2.0 solution using keys. In its current form it
> returns the wanted offset (length) as a hexadecimal number. To convert
> this to a decimal number, have a look at the archives where AFAIR
> there is such code -- could be even posted by me ... This conversion
> to decimal can certainly be written elegantly as fold-left() in
> XSLT/XPath 3
> XSL-List info and archive
> EasyUnsubscribe (by email)

Dimitre Novatchev
Truly great madness cannot be achieved without significant intelligence.
To invent, you need a good imagination and a pile of junk
Never fight an inanimate object
To avoid situations in which you might make mistakes may be the
biggest mistake of all
Quality means doing it right when no one is looking.
You've achieved success in your field when you don't know whether what
you're doing is work or play
To achieve the impossible dream, try going to sleep.
Facts do not cease to exist because they are ignored.
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
Sanity is madness put to good use.
I finally figured out the only reason to be alive is to enjoy it.

Current Thread