Re: [xsl] Efficient XSLT range lookups / questions

Subject: Re: [xsl] Efficient XSLT range lookups / questions
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Thu, 7 Oct 2010 19:55:10 -0700
Hermann,

The only pre-processing needed (once only) is to sort the regions on
their @min values.

Then I use this transformation:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:xs="http://www.w3.org/2001/XMLSchema";
 xmlns:f="http://fxsl.sf.net/";
 exclude-result-prefixes="f xs"
 >
 <xsl:import href="../f/func-binLastLower.xsl"/>

 <xsl:key name="kRegByStart"  match="e" use="@min"/>

 <xsl:output omit-xml-declaration="yes"/>

 <xsl:template match="/" >
   <xsl:variable name="vNotFound" select="'NOT WITHIN ANY REGION'"/>

   <xsl:variable name="vSeq" select="/*/*/@min/xs:integer(.)" as="item()+"/>

     <xsl:sequence select=
      "for $val in 1630,
           $vMin in f:binLastLower($vSeq, $val, 1, count($vSeq))
        return
           if($vMin)
             then (for $vReg in key('kRegByStart', string($vMin))
                     return
                       if($val le xs:integer($vReg/@max))
                         then string($vReg/@value)
                         else $vNotFound
                  )
             else $vNotFound

      "/>

  </xsl:template>
</xsl:stylesheet>

and the imported file contains:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:xs="http://www.w3.org/2001/XMLSchema";
 xmlns:f="http://fxsl.sf.net/";
 exclude-result-prefixes="f xs"
 >

 <xsl:function name="f:binLastLower" as="item()?">
   <xsl:param name="pSeq" as="item()+"/>
   <xsl:param name="pVal" as="item()"/>
   <xsl:param name="pStart" as="xs:integer"/>
   <xsl:param name="pEnd" as="xs:integer"/>

   <xsl:sequence select=
    "if($pStart gt $pEnd) then ($pSeq[$pEnd])
       else
        for $mid in ($pStart + $pEnd) idiv 2,
             $sVal in $pSeq[$mid]
           return
             if($sVal eq $pVal) then $pSeq[$mid]
             else
                if($sVal lt $pVal)
                   then f:binLastLower($pSeq, $pVal, $mid+1, $pEnd)
                else
                  f:binLastLower($pSeq, $pVal, $pStart, $mid - 1)
       "/>
 </xsl:function>

</xsl:stylesheet>

The XML file I tested this on is:

<es>
  <e min="1001" max="1010" value="A" />
  <e min="1021" max="1115" value="B" />
  <e min="1131" max="1235" value="C" />
  <e min="1246" max="1248" value="D" />
  <e min="1521" max="1825" value="E" />
  <e min="1901" max="1905" value="F" />
  <e min="1926" max="1990" value="G" />
</es>

and, in this case, the correct answer: "E" is produced.


So, how this works?

The function f:binLastLower() tries to find the provided $pVal
argument in the provided $pSeq (sequence) argument. If $pVal is not
one of the values of the values in $pSeq , then the maximum value in
$pSeq that is not larger than $pVal is returned.

In this way, if  vSeq is defined as /*/*/@min/xs:integer(.)

then

    f:binLastLower($vSeq, $val, 1, count($vSeq))

returns @min attribute with the maximum value that is still not larger
than $val.

The final little step to decide whether $val belongs to the interval
of this @min is to compare $val with the @min attribute of the same
interval.

This is a modification of the traditional binary-search algorithm. The
advantages of using this solution is that the only preprocessing
necessary is sorting the intervals. After that the code can be used as
is. No generation of XSLT code or special XML "binary trees" is
necessary.

This is an O(log(N)) solution. It can be improved even more by using
sentinel values, such as -999999 for @min and 99999999 for @max, so
that f:binLastLower() always returns non-empty result and then the
check for non-empty result can be eliminated.

It would be interesting to know how the performance of this solution
-- out of the box -- compares to the other two solutions.


--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play





On Thu, Oct 7, 2010 at 1:09 PM, Hermann Stamm-Wilbrandt
<STAMMW@xxxxxxxxxx> wrote:
> Dimitre,
>
>> Are the regions non overlapping (as in the examples in [1]) ?
> from what I heard about the customer scenario, yes.
>
> The implementation in [1] returns the @value of "a" range it lies
> in (in case of overlapping ranges).
>
> So for performance comparison non-overlapping can be assumed.
>
>
> Mit besten Gruessen / Best wishes,
>
> Hermann Stamm-Wilbrandt
> Developer, XML Compiler, L3
> Fixpack team lead
> WebSphere DataPower SOA Appliances
> ----------------------------------------------------------------------
> IBM Deutschland Research & Development GmbH
> Vorsitzender des Aufsichtsrats: Martin Jetter
> Geschaeftsfuehrung: Dirk Wittkopp
> Sitz der Gesellschaft: Boeblingen
> Registergericht: Amtsgericht Stuttgart, HRB 243294
>
>
> From: B  B  B  Dimitre Novatchev <dnovatchev@xxxxxxxxx>
> To: B  B  B  B  xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Date: B  B  B  10/07/2010 07:25 PM
> Subject: B  B Re: [xsl] Efficient XSLT range lookups / questions
>
>
>
> Hermann,
>
> Are the regions non overlapping (as in the examples in [1]) ?
>
>
>
> On Thu, Oct 7, 2010 at 4:33 AM, Hermann Stamm-Wilbrandt
> <STAMMW@xxxxxxxxxx> wrote:
>> Dimitre,
>>
>>> Have you compared these to using just a single binary search function,
>>> as the one offered by FXSL for at least the last six years?
>>
>> because I wanted to compare optimal performance I did the minimal
>> binary search implementation for the problem myself.
>>
>> In [1] I describe a simple input format and provide a stylesheet for
>> generating the optimal binary decision stylesheet from that (by
>> the measurements done that outperforms binary search on a XML tree).
>>
>> Remember that the investigation was for XSLT 1 and XSLT 2 processors.
>>
>> May you please point out on how to setup a test with f:binSearch()
>> on the simple range lookup input format in [1]?
>>
>> Since it works on XSLT 2.0 sequences probably that will get better
>> results than my binary search based on nodesets.
>>
>>
>> [1]
>>
>
https://www.ibm.com/developerworks/forums/thread.jspa?threadID=348576#1453965
6
>
>>
>>
>> Mit besten Gruessen / Best wishes,
>>
>> Hermann Stamm-Wilbrandt
>> Developer, XML Compiler, L3
>> Fixpack team lead
>> WebSphere DataPower SOA Appliances
>> ----------------------------------------------------------------------
>> IBM Deutschland Research & Development GmbH
>> Vorsitzender des Aufsichtsrats: Martin Jetter
>> Geschaeftsfuehrung: Dirk Wittkopp
>> Sitz der Gesellschaft: Boeblingen
>> Registergericht: Amtsgericht Stuttgart, HRB 243294
>>
>>
>>
>> From: B  B  B  Dimitre Novatchev <dnovatchev@xxxxxxxxx>
>> To: B  B  B  B  xsl-list@xxxxxxxxxxxxxxxxxxxxxx
>> Date: B  B  B  10/05/2010 09:58 PM
>> Subject: B  B Re: [xsl] Efficient XSLT range lookups / questions
>>
>>
>>
>> Have you compared these to using just a single binary search function,
>> as the one offered by FXSL for at least the last six years?
>>
>>
>
http://fxsl.cvs.sourceforge.net/viewvc/fxsl/fxsl-xslt2/f/func-binSearch.xsl?r
evision=1.4&view=markup
>
>>
>>
>>
>> B <xsl:function name="f:binSearch" as="xs:boolean">
>> B  <xsl:param name="pSeq" as="item()+"/>
>> B  <xsl:param name="pVal" as="item()"/>
>> B  <xsl:param name="pStart" as="item()"/>
>> B  <xsl:param name="pEnd" as="item()"/>
>>
>> B  <xsl:sequence select=
>> B  B "if($pStart gt $pEnd) then false()
>> B  B  B  else
>> B  B  B  B for $mid in ($pStart + $pEnd) idiv 2,
>> B  B  B  B  B  B  $sVal in $pSeq[$mid]
>> B  B  B  B  B  return
>> B  B  B  B  B  B  if($sVal eq $pVal) then true()
>> B  B  B  B  B  B  else
>> B  B  B  B  B  B  B  B if($sVal lt $pVal)
>> B  B  B  B  B  B  B  B  B  then f:binSearch($pSeq, $pVal, $mid+1, $pEnd)
>> B  B  B  B  B  B  B  B else
>> B  B  B  B  B  B  B  B  B f:binSearch($pSeq, $pVal, $pStart, $mid - 1)
>> B  B  B  "/>
>> B </xsl:function>
>>
>>
>> --
>> Cheers,
>> Dimitre Novatchev
>> ---------------------------------------
>> Truly great madness cannot be achieved without significant intelligence.
>> ---------------------------------------
>> To invent, you need a good imagination and a pile of junk
>> -------------------------------------
>> Never fight an inanimate object
>> -------------------------------------
>> You've achieved success in your field when you don't know whether what
>> you're doing is work or play
>>
>>
>>
>>
>> On Tue, Oct 5, 2010 at 10:58 AM, Hermann Stamm-Wilbrandt
>> <STAMMW@xxxxxxxxxx> wrote:
>>>
>>> Hello,
>>>
>>> I got notice of an interesting scenario needing a huge amount of range
>>> lookups in XSLT (billions per year with more than 20000 different
>> ranges).
>>>
>>>
>>> My web searches prior to this only showed range lookups of complexity
>>> linear in the number of ranges to be searched. I am sure that my
> searches
>>> are not perfect and I just missed relevant postings.
>>> Are there any relevant postings?
>>>
>>>
>>> Since the ranges change rarely precomputing was a good option.
>>>
>>> I compared binary search trees against stylesheets with a binary search
>>> structure.
>>>
>>> Findings based on experiments with saxon9he and DataPower XSLT processor
>>> [1]:
>>> - binary outperforms linear
>>> - binary stylesheets outperform binary XML searchtrees
>>> - in case the XSLT processor supports document and/or stylesheet caching
>>> B the lookup performance remains good even for single lookups
> (logarithmic
>>> B in depth of search tree; DataPower supports both, and web searches
>>> indicate
>>> B that .net framework also supports document/stylesheet caching)
>>>
>>> Are there even better alternatives for doing quick range lookups?
>>>
>>>
>>> [1] "Efficient XSLT range lookups"
>>>
>>
>
https://www.ibm.com/developerworks/forums/thread.jspa?threadID=348576&tstart=
0
>
>>
>>>
>>>
>>> Mit besten Gruessen / Best wishes,
>>>
>>> Hermann Stamm-Wilbrandt
>>> Developer, XML Compiler, L3
>>> Fixpack team lead
>>> WebSphere DataPower SOA Appliances
>>> ----------------------------------------------------------------------
>>> IBM Deutschland Research & Development GmbH
>>> Vorsitzender des Aufsichtsrats: Martin Jetter
>>> Geschaeftsfuehrung: Dirk Wittkopp
>>> Sitz der Gesellschaft: Boeblingen
>>> Registergericht: Amtsgericht Stuttgart, HRB 243294
>>
>>
>
>
>
> --
> Cheers,
> Dimitre Novatchev
> ---------------------------------------
> Truly great madness cannot be achieved without significant intelligence.
> ---------------------------------------
> To invent, you need a good imagination and a pile of junk
> -------------------------------------
> Never fight an inanimate object
> -------------------------------------
> You've achieved success in your field when you don't know whether what
> you're doing is work or play
> -------------------------------------
> I enjoy the massacre of ads. This sentence will slaughter ads without
> a messy bloodbath.

Current Thread