Re: [xsl] improving performance in creating ids

Subject: Re: [xsl] improving performance in creating ids
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 24 Apr 2019 21:29:14 -0000
Pieter,

That is excellent.

However, I haven't given up yet on xsl:number/@from -- not saying I'll
explain it or make it work, but unless I miss something (not
impossible), it *should* work the way we want and if it doesn't, there
must be something about it, or the problem, we aren't seeing. (Or a
bug in the processor?)

After all, a use case such as you have described is what this syntax
is clearly meant to address.

The news that a counting-based solution is not much better with a key,
than without it, is interesting, but possibly due to Saxon
optimizations (processor?) ... which suggests that some processors
might *really* take their sweet time with a raw XPath counting-based
solution....

Cheers, Wendell






On Wed, Apr 24, 2019 at 10:23 AM Pieter Lamers
pieter.lamers@xxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
>
> Hi all,
>
> In the end I found the solution for my original numbering plan in this
> xsl:number expression:
>
> <xsl:number level="any" count="*[. &gt;&gt; $ancestor-with-id][@rid]"/>
>
> the '>>' operator performs well enough (total processing time for the
> test book now 5 seconds) and was brought to my kind attention by Erik
> Siegel. Thanks for all your help.
>
> Best,
> Pieter
>
> On 24/04/2019 07:46, Pieter Lamers pieter.lamers@xxxxxxxxxxxx wrote:
> > Hi Wendell,
> >
> > Had not seen your subsequent replies before I signed off last night.
> > Your solution below involves a count which brings back my original
> > performance problem. I think I will change my requirement for
> > "locally" numbered ids somewhat so I can profit most from xsl:number.
> > still, sad that 'from' cannot serve my purpose (or so it seems).
> >
> > Hi Liam,
> >
> > You are probably right that indexing + keys should work in the xquery
> > solution. I'd have to dive a little further into that area before I
> > can put it to use; my initial efforts did not make a change.
> >
> > Thanks and all the best,
> > Pieter
> >
> > On 23/04/2019 23:47, Wendell Piez wapiez@xxxxxxxxxxxxxxx wrote:
> >> Okay this is my next shot --
> >>
> >> <xsl:value-of select="ancestor::*[exists(@id)][1]/@id || '-' ||
> >> local-name() ||
> >> count(
> >> key('elems-by-name',local-name(),ancestor::*[exists(@id)][1])[current()
> >>>> .] ) + 1"/>
> >> but after having done that I'd probably go back to xsl:number.
> >>
> >> Partly since it's probably as fast, but mainly because declarative
> >> syntax rules.
> >>
> >> (Note: still untested. Use at your own risk!)
> >>
> >> Cheers, Wendell
> >>
> >>
> >> On Tue, Apr 23, 2019 at 5:40 PM Wendell Piez wapiez@xxxxxxxxxxxxxxx
> >> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >>> Oops, hit button too soon -- you'll see the error there.
> >>>
> >>> I leave scoping the correct count as an exercise, but it's in there
> >>> somewhere! :-)
> >>>
> >>> Cheers, Wendell
> >>>
> >>> On Tue, Apr 23, 2019 at 5:39 PM Wendell Piez
> >>> <wapiez@xxxxxxxxxxxxxxx> wrote:
> >>>> Hi again,
> >>>>
> >>>> Also note if we had a key we would need no variable --
> >>>>
> >>>> <xsl:value-of select="local-name() || '-'"/>
> >>>> <xsl:number level="any" from="*[@id]"
> >>>> count="key('elem-by-name',local-name())"/>
> >>>>
> >>>> which suggests we could also use the third argument of key() ...
> >>>>
> >>>> <xsl:value-of select="local-name() || '-' ||
> >>>> count(key('elems-by-name',local-name(),ancestor::*[exists(@id)][1]))"/>
> >>>>
> >>>>
> >>>> still not tested -- but ought to work, syntax errors aside --
> >>>>
> >>>> Cheers, Wendell
> >>>>
> >>>> On Tue, Apr 23, 2019 at 5:31 PM Wendell Piez
> >>>> <wapiez@xxxxxxxxxxxxxxx> wrote:
> >>>>> Hey Pieter,
> >>>>>
> >>>>> If performance were the issue, I might try factoring out the ID
> >>>>> labeling into a completely separate pass, in order (for example) to
> >>>>> implement it as a sibling traversal, passing parameters forward to
> >>>>> increment the ID values. (If your numbering is fancy, for example
> >>>>> scoping the increment to the element type as well as the ancestor,
> >>>>> you
> >>>>> might have to pass a map forward.) I think that ought to be pretty
> >>>>> fast, plus it separates this logic from the other logic of the XSLT.
> >>>>> It's essentially like treating the XSLT engine like an overpowered
> >>>>> SAX
> >>>>> parser. (Not that I would know how to make one of those.)
> >>>>>
> >>>>> But this is only if xsl:number wasn't doing it, after I tried
> >>>>> something like what Martin H shows with plain old templates.
> >>>>>
> >>>>> <xsl:variable name="ilk" select="local-name()"/>
> >>>>> <xsl:value-of select="$ilk || '-'"/>
> >>>>> <xsl:number level="any" from="*[@id]" count="*[local-name() eq
> >>>>> $ilk]"/>
> >>>>>
> >>>>> -- untested --
> >>>>>
> >>>>> Cheers, Wendell
> >>>>>
> >>>>> On Tue, Apr 23, 2019 at 10:57 AM Martin Honnen martin.honnen@xxxxxx
> >>>>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >>>>>> On 23.04.2019 16:28, Pieter Lamers pieter.lamers@xxxxxxxxxxxx wrote:
> >>>>>>
> >>>>>>> Thanks for your quick reply. the node identity comparison helped
> >>>>>>> quite a
> >>>>>>> bit, although I am still around a minute for a full book of ids.
> >>>>>>> I am
> >>>>>>> not sure how xsl:number would help here, and what kind of
> >>>>>>> performance
> >>>>>>> win it would give over count(). I tried something with a nested
> >>>>>>> transformation, but what should I feed it?
> >>>>>>>
> >>>>>>>       <xsl:number select="*[last()]"/>
> >>>>>>> works (given a set of preceding nodes) but it is slightly slower
> >>>>>>> than a
> >>>>>>> count() in the xquery. Maybe I should be using xsl:number
> >>>>>>> differently?
> >>>>>>
> >>>>>> It is difficult for me to suggest that without knowing the XML input
> >>>>>> structure and whether you want to generate that id based on a
> >>>>>> count or
> >>>>>> numbering only for certain nodes or some particular element type. In
> >>>>>> general if I wanted to delegate counting to xsl:number similar to
> >>>>>> your
> >>>>>> function I would define a template in a mode for that e.g.
> >>>>>>
> >>>>>>      <xsl:template match="*" mode="number">
> >>>>>>         <xsl:number level="any" from="*[@id]"/>
> >>>>>>      </xsl:template>
> >>>>>>
> >>>>>> and then, where you need that number, you would use e.g.
> >>>>>>
> >>>>>>      <xsl:apply-templates select="." mode="number"/>
> >>>>>>
> >>>>>> Both the template or the or the select of the apply-templates can of
> >>>>>> course be adapted to more particular needs.
> >>>>>>
> >>>>>> As for being more efficient that using count, that then depends
> >>>>>> on the
> >>>>>> implementation but I would think there is some optimization to be
> >>>>>> expected in an XSLT processor for xsl:number.
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> ...Wendell Piez... ...wendell -at- nist -dot- gov...
> >>>>> ...wendellpiez.com... ...pellucidliterature.org...
> >>>>> ...pausepress.org...
> >>>>> ...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
> >>>>
> >>>>
> >>>> --
> >>>> ...Wendell Piez... ...wendell -at- nist -dot- gov...
> >>>> ...wendellpiez.com... ...pellucidliterature.org...
> >>>> ...pausepress.org...
> >>>> ...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
> >>>
> >>>
> >>> --
> >>> ...Wendell Piez... ...wendell -at- nist -dot- gov...
> >>> ...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
> >>> ...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
> >>>
> >>
> >>
> --
> Pieter Lamers
> John Benjamins Publishing Company
> Postal Address: P.O. Box 36224, 1020 ME AMSTERDAM, The Netherlands
> Visiting Address: Klaprozenweg 75G, 1033 NN AMSTERDAM, The Netherlands
> Warehouse: Kelvinstraat 11-13, 1446 TK PURMEREND, The Netherlands
> tel: +31 20 630 4747
> web: www.benjamins.com
> 



-- 
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...

Current Thread