replacing key() with pipe.

Subject: replacing key() with pipe.
From: Paul Tchistopolskii <paul@xxxxxxx>
Date: Sat, 05 Aug 2000 05:30:19 -0700
Dear Sebastian.

In this letter I'm providing the simple invariant of yours test6.xsl

But first some long ( sorry) explanation.

On my box my invariant is working twice as slow ( comparing to key()) 
on  the 'special'  file  which is:

<?xml version="1.0"?><!DOCTYPE cemetery SYSTEM "cem.dtd"
[ 
<!ENTITY data1 SYSTEM "data1.xml">
<!ENTITY data2 SYSTEM "data2.xml">
]>
<cemetery>
&data1;
&data2;
&data2;
</cemetery>

I had to produce such a strange file because with your 'smallest'
file the difference in speed was not that easy to find, but on your 
'biggest'  file I got constant swapping ( Windows, 128 Mb ). So 
I produced 'something'  'relatively big, but without 
swapping'.

saxon + test6 = 1 minute.
saxon + my test6 = 2 minutes.

<realitycheck>
Honestly - I don't care spending 1 minute or  2 minutes
( or even 3  or 4 minutes ) for this *exotic* activity. It should 
be all powered by the repository. Text file is not a good 
storage for this kind of information if you want to query 
this file every five minutes and if you want to make 
that query once per week / day it will not hurt to wait 
for 2 minutes instead of one. 
</realitycheck>

I can 'improve' the pipe using java extension with side-effects
( the biggest  weakness is 'flat -> hierarchical shift which is 
based on the weak ( but standard for XSLT ) 'count-based 
recursion'. ) It looks that with java extension emulating 1 ( one ) 
updateable variable this could make it significantly faster. 

<realitycheck>
But is it worth trying? Do you really care is it 
1 minute or 3 minutes ? Anyway it seems that 
it does not scale because of the memory first of all. 
And of course -  it is ages behind scalability provided 
by any SQL server ( including MySQL ).
</realitycheck>

Now what I did. I'm sorry for explaining many details, but 
I think it could be interesting what happened with this task.

1. First ( and most important ) I started  thinking about the 
task itself,  about the functionality I have to provide ( not thinking 
about the 'key()' or other XSLT stuff at all ).

What test6.xsl actually does :

2.A Query.

-  It takes all the /cemetery/person. 
-  It pulls out :
                person/died/date/yr
                second name
                first name
      
-   Persons should be sorted by : Year, Second name , First Name
  
2.B Rendering.

- It then renders the list of persons, but  the Year is displayed only 
  for the group.

So here we go.

3.  Query part.

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version="1.0"
  >
<!--JOB: process cemetery file to make a year catalogue, NOT using keys (1) -->

<xsl:template match="/cemetery">

<doc>

 <xsl:for-each select="stone/person">
      <xsl:sort select="died/date/yr"/>
      <xsl:sort select="name/snm"/>
      <xsl:sort select="name/fnm"/>

      <person>
      <year><xsl:value-of select="died/date/yr"/></year>
      <snm><xsl:value-of select="name/snm"/></snm>
      <fnm><xsl:value-of select="name/fnm"/></fnm>
      </person>

 </xsl:for-each>

</doc>

</xsl:template>

</xsl:stylesheet>


I think it is easy to  understand what happens here. We are just 
blindly translating the requirements for Query part into XSLT.

So this component have produced the stream:

<doc>
<person><year>123</year><snm>NAME</snm><fnm>NAME</fnm></person>
<person><year>123</year><snm>NAME</snm><fnm>NAME</fnm></person>
....
</doc>
    
Now all we need is to render this 'flat' structure  into the 'groups'  ( because 
we  want the year to get displayed only once per 'group'.  - as it is in 
requirement for Rendering part ). I could write this in XSLScript ;-)
But for the sake of conformance here comes the ugly XSLT call-template.

<side-effect>
In the next version of XSLScript there will be yet another  loop compiler 
'meta-construction' , not only  'else' ( because I finally got tired with 
this loop --> recursion conversion ). 
</side-effect>

Whatever. This is  again - *typical*  recursive XSLT.

 - take the first elements from list by some criteria 
- draw them
- recursively call yourself with the rest of the list.

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version="1.0"
  >
<!--JOB: process cemetery file to make a year catalogue, NOT using keys (2) -->

<xsl:template match="/doc">
 <html> 
 <head>
 <title>Protestant Cemetery Catalogue </title>
 </head>
 <body>

 <xsl:call-template name="draw_year">
    <xsl:with-param name="list" select="/doc/*"/>
 </xsl:call-template>

 </body>
 </html>
</xsl:template>

<xsl:template name="draw_year">
 <xsl:param name="list"/>

<xsl:if test="$list">
  <xsl:variable name="year" select="$list[1]/year"/>
  <xsl:variable name="n_souls"  select="count( $list[year = $year ])"/>
  <xsl:variable name="rest"  select="$list[ (position() &gt; $n_souls) ]"/>

  <h2><xsl:value-of select="$year"/></h2>

  <ol>
  <xsl:for-each select="$list[ not (position() &gt; $n_souls) ]">
         <li><b><xsl:value-of select="snm"/></b>,
                <xsl:value-of select="fnm"/></li>
  </xsl:for-each>
  </ol>

   <xsl:call-template name="draw_year">
         <xsl:with-param name="list" select="$rest"/>
   </xsl:call-template>
 </xsl:if>

</xsl:template>

</xsl:stylesheet>


Design patterns used. 
------------------------------

Ux is about pipes of simple XSLT components. Have you 
mentioned that there is no HTML tags in the Query
part at all ? 

Another Ux  'design pattern'  is that Query - it is producing 
some kind of 'formatting objects' for the 'renderer'. Renderer 
is just  blindly doing the production of HTML. 

I wish this explains why I'm not using key(). Those 
'select from .. dual' could be hardly  produced from the 
functional specification ( the  code written above is just a 
simple reflection of functional specification into simple 
and general XSLT constructions. ).

Yes, I have to admit - if not polluting this with some 
'other' ugly constructions it works twice as slow ( maybe 
tree times as slow )  than key() - based solution.

Should I start polluting this 'plain XSLT'  thing with 
ugly java hacks, or we can wait 2 minutes instead 
of 1 minute ( but keep the code supportable by 
anybody ?) 

Rgds.Paul.

PS. I encountered *crazy* jumps of the speed on different 
boxes and different versions of the VM. On some boxes 
SAXON is ( significantly ) faster than XT ( on some 'other
stylesheets' ) because it seems that instant SAXON was 
compiled with some tool which works nice with  MS  VM. 
What is the tool?  Ah - there are at least 3 of Java boosters 
out there and some are specifically Windows oriented.
Benchmarking XSLT is hard, I think.



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread