Re: [xsl] Beware the count method with Muenchian grouping (was: Testing by counting or positional predicate)

Subject: Re: [xsl] Beware the count method with Muenchian grouping (was: Testing by counting or positional predicate)
From: "Daniel Bowen" <dbowen2@xxxxxxxxx>
Date: Tue, 16 Jan 2001 11:16:22 -0700
Jeni Tennison wrote:

> //Texture
>   [count(key('key-texture', concat(@texture, ':', @u, ':', @v)[1]) =
>    count(.|key('key-texture', concat(@texture, ':', @u, ':', @v))[1])]
>
> This is more strictly the equivalent of:
>
> //Texture
>   [generate-id() =
>    generate-id(key('key-texture', concat(@texture, ':', @u, ':', @v))[1])]
>
> It's just that in most cases, you're assured that the key will return
> a node.
>
> Of course the two concat()s, two count()s and the two key()s might
> well mean that your assertion still stands: that the generate-id()
> method is better in your situation. If you test it, let us know what
> you find.

Francis Norton wrote:

> Having slept on this, it occurs to me that the results for:
>
> <xsl:key
>   name="all-texture"
>   match="Texture"
>   use="concat(@texture, ':', @u, ':', @v)" />
> <xsl:variable
>   name="primary-textures"
>   select="//Texture[count(. | (key('all-texture', concat(@texture, ':',
> @u, ':', @v))[1]) = 1] [0=count(preceding-sibling::Texture[1])]" />
>
> would also be interesting.


Warning: This is a long message :-)
Sorry I didn't get back on this sooner.

I did some time runs with MSXML 3.0 and Saxon 6.0.2 (with Sun JDK 1.3) on
Windows NT 4.0 Workstation with Sp5.

The real stylesheet uses a "unique-textures" node set which are unique
combinations of "primary" and "secondary" texture combinations (the example
I used in previous e-mails was just to simplify things). Here are the
different cases I used:

1.  Predicate in the key. Generate node set using "generate-id" method.

  <xsl:key
   name="key-texture"
   match="Texture[0=count(preceding-sibling::Texture[1] |
parent::EdgeTexture/preceding-sibling::EdgeTexture[1])]"
   use="concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        )" />

  <xsl:variable
   name="unique-textures"
   select="//Texture[generate-id(.) = generate-id(key('key-texture',
        concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        ))[1])]" />


2. Predicate in the key. Generate node set using the "1=count(.|key...) plus
same prediate from key match" method.


  <xsl:key
   name="key-texture"
   match="Texture[0=count(preceding-sibling::Texture[1] |
parent::EdgeTexture/preceding-sibling::EdgeTexture[1])]"
   use="concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        )" />

  <xsl:variable
   name="unique-textures"
   select="//Texture
    [0=count(preceding-sibling::Texture[1] |
parent::EdgeTexture/preceding-sibling::EdgeTexture[1])]
    [1=count(.|key('key-texture',
        concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        ))[1])]" />



3. Predicate in the key. Generate node set using the "count(key..) =
count(.|key..)" method:

  <xsl:key
   name="key-texture"
   match="Texture[0=count(preceding-sibling::Texture[1] |
parent::EdgeTexture/preceding-sibling::EdgeTexture[1])]"
   use="concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        )" />


  <xsl:variable
   name="unique-textures"
   select="//Texture[count(key('key-texture',
        concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        ))[1]) = count(.|key('key-texture',
        concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        ))[1])]" />

4. No Predicate in key "match".  Generate node set using the
"1=count(.|key...) plus prediate" method.  Have the predicate first.

  <xsl:key
   name="key-texture"
   match="Texture"
   use="concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        )" />

  <xsl:variable
   name="unique-textures"
   select="//Texture
     [0=count(preceding-sibling::Texture[1] |
parent::EdgeTexture/preceding-sibling::EdgeTexture[1])]
     [1=count(.|key('key-texture',
        concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        ))[1])]" />


5. No Predicate in key "match".  Generate node set using the
"1=count(.|key...) plus prediate" method.  Have the predicate last.

  <xsl:key
   name="key-texture"
   match="Texture"
   use="concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        )" />

  <xsl:variable
   name="unique-textures"
   select="//Texture
     [1=count(.|key('key-texture',
        concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        ))[1])]
     [0=count(preceding-sibling::Texture[1] |
parent::EdgeTexture/preceding-sibling::EdgeTexture[1])]
   " />

6. No Predicate in key "match".  Generate node set using the "generate-id()
plus prediate" method.  Have the predicate first.

  <xsl:key
   name="key-texture"
   match="Texture"
   use="concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        )" />

  <xsl:variable
   name="unique-textures"
   select="//Texture
     [0=count(preceding-sibling::Texture[1] |
parent::EdgeTexture/preceding-sibling::EdgeTexture[1])]
     [generate-id(.) = generate-id(key('key-texture',
        concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        ))[1])]" />

7. No Predicate in key "match".  Generate node set using the "generate-id()
plus prediate" method.  Have the predicate last.

  <xsl:key
   name="key-texture"
   match="Texture"
   use="concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        )" />

  <xsl:variable
   name="unique-textures"
   select="//Texture
     [generate-id(.) = generate-id(key('key-texture',
        concat(
          TextureProperties/@texture, '*',
          TextureProperties/MaterialDetailTexture/@detailTextureName, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/@texture, '*',
          (following-sibling::Texture[1] |
parent::EdgeTexture/following-sibling::EdgeTexture[1]/Texture[1])/TexturePro
perties/MaterialDetailTexture/@detailTextureName
        ))[1])]
     [0=count(preceding-sibling::Texture[1] |
parent::EdgeTexture/preceding-sibling::EdgeTexture[1])]
     " />



The first input file:
 Size: 5100 KB.
 Elements: 20108
 "Texture" elements: 1728
   of which:
    1722 were "Primary Textures"
    6 were "Secondary Textures"
 "Unique Textures":  8
  (unique combinations of primary and secondary textures)

The second input file:
 Size: 6400 KB.
 Elements: 24562
 "Texture" elements: 3326
   of which:
    2998 were "Primary Textures"
    328 were "Secondary Textures"
 "Unique Textures":  24
  (unique combinations of primary and secondary textures)


For each case, I ran 7 timed runs, threw out the high and low, and took the
average of the remaining 5.  With MSXML, I timed the overall process of
Creating the XML objects and writing to an output file, as well as just the
time that it took to call IXSLProcessor::transform.  With saxon, I just used
"instant saxon 6.0.2", and timed how long it took to run the command line
"saxon.exe -o out.html test.xml test.xsl".  A more thorough test would have
been to write a java app, and also time just the transform part, but I
didn't do that.  Here's the results (the time is in milliseconds):

Input File 1:

Case     MSXML Overall     MSXML Transform     SAXON Overall
1        4530.6            1266                29988.6
2        6523.2            3282.6              10791.4
3        8159.6            5001                13373
4        6469.4            3239                10743.4
5        6415              3100.6              10625.4
6        4652.8            1288                29720.8
7        4504.6            1221.4              30291.4

Input File 2:

Case     MSXML Overall     MSXML Transform     SAXON Overall
1        6485.2            2042.8              63559.6
2        8081.4            3601                13802
3        9437.4            5025.4              17581.4
4        7763              3449                13805.8
5        7955.4            3543.2              13710.2
6        6453.4            2089                60533
7        6415.2            1965                64258.6


With MSXML, from fastest to slowest for File 1:
Case 7  (no filter on keys, use "generate-id(.)=generate-id(key...) plus
predicate placed last" for node set)
Case 1  (filter keys, use "generate-id(.)=generate-id(key...)" for node set)
Case 6  (no filter on keys, use "generate-id(.)=generate-id(key...) plus
predicate placed first" for node set)
Case 5  (no filter on keys, filter and count for node set, predicate last)
Case 4  (no filter on keys, filter and count for node set, predicate first)
Case 2  (filter keys, 1 = count(.|key..) plus predicate)
Case 3  (filter keys, count(key...) = count(.|key..))

With MSXML for File 2:
Same as File 1, except Case 4 was faster than Case 5.

With Saxon, from fastest to slowest for File 1:
Case 5  (no filter on keys, filter and count for node set, predicate last)
Case 4  (no filter on keys, filter and count for node set, predicate first)
Case 2  (filter keys, 1 = count(.|key..) plus predicate)
Case 3  (filter keys, count(key...) = count(.|key..))
Case 6  (no filter on keys, use "generate-id(.)=generate-id(key...) plus
predicate placed first" for node set)
Case 1  (filter keys, use "generate-id(.)=generate-id(key...)" for node set)
Case 7  (no filter on keys, use "generate-id(.)=generate-id(key...) plus
predicate placed last" for node set)

With Saxon for File 2:
Same as File 1, except Case 2 was faster than Case 4.

So with MSXML, the "generate-id" cases are by far the fastest. The speed up
for File 1:
Case 7:
    2.54 x faster than Case 5
    2.65 x faster than Case 4
    2.69 x faster than Case 2
    4.09 x faster than Case 3
Case 1:
    2.44 x faster than Case 5
    2.56 x faster than Case 4
    2.59 x faster than Case 2
    3.95 x faster than Case 3
Case 6:
    2.41 x faster than Case 5
    2.51 x faster than Case 4
    2.55 x faster than Case 2
    3.88 x faster than Case 3

and the speed up for File 2:
Case 7:
    1.80 x faster than Case 5
    1.76 x faster than Case 4
    1.83 x faster than Case 2
    2.56 x faster than Case 3
Case 1:
    1.73 x faster than Case 5
    1.69 x faster than Case 4
    1.76 x faster than Case 2
    2.46 x faster than Case 3
Case 6:
    1.70 x faster than Case 5
    1.65 x faster than Case 4
    1.72 x faster than Case 2
    2.41 x faster than Case 3


I was suprised to see that generating keys for all Texture nodes, and
filtering out later was about the same in terms of speed, and even a little
faster in some cases (although it did take a little more memory).  It does
depend on how many keys get generated though.  I have timed other cases
where generating keys for everything and filtering later was slower than
using a predicate in xsl:key "match" (especially when a lot gets filtered
out by the predicate,).


Its strange how much slower the "generate-id" methods were with SAXON:
Case 1, File 1:
    2.82 x slower than Case 5
    2.79 x slower than Case 4
    2.78 x slower than Case 2
    2.24 x slower than Case 3


Either Saxon is doing "generate-id" a heck of a lot slower than MSXML, or
its doing "count" and "predicates" relatively faster.  I'm guessing that
MSXML is doing "generate-id" much faster.

Anyway, take all of this for what it's worth :-)  If you want to see the
input, the test harness scripts, the spreadsheet with the results, etc., let
me know offline and I'll send it too you (be prepared for over 11 MB. though
:-) ).

-Daniel Bowen
Software Engineer


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread