Practical Suggestion for XSLT Performance Improvement

Subject: Practical Suggestion for XSLT Performance Improvement
From: "Clark C. Evans" <clark.evans@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 6 Oct 1999 22:21:04 -0400 (EDT)
A Practical Suggestion for XSLT Performance Improvement


Summary:

  In this document a method is suggested for dramatically
  improving XSLT processor performance (2-10x or more) 
  for financial information or other 'high volume' 
  information streams. It is shown how the time to 
  produce a summary report for an 100K input stream 
  changes from two and a half minutes to 14 seconds.  
  It is also suggested how this number could be further 
  reduced with an alternative XSLT processor design.

Outline:

  1. Application domain
  2. Design of an appropriate benchmark
  3. A solution using existing XSLT constructs
  4. "Functional Programming" and "Touch It Once"
  5. A possible solution:  Child to parent named pipes?
  6. A practical suggestion: Aggregate hashtables
  7. Conclusion: Perspective is everything
  8. Credits and References

Appendix:

  A. Sample source data set.
  B. Sample report
  C. Generator for source data set.
  D. Stylesheet using existing constructs
  E. Implementation of new construct for jclark's XT.
  F. Stylesheet using new constructs


1. Application domain

  One of the larger 'obvious' application domains for XSL
  is the reporting of business transaction and/or financial 
  information.  Typically these reports summarize the 
  information according to two or more categories to 
  produce a cross-tabulated result.  In lighter weight
  cases, a simple total for an order is required.
  In either case, summary information is generated
  based on the content of the source data.

2. Design of an appropriate benchmark

  To test various alternatives, the time required to generate
  a two dimensional cross-tabulated summary report seems like 
  the best objective.  For this summary, we assume that the
  axis may be known by the stylesheet author, however, the
  actual categories (labels) on the access may not.  Thus, the
  stylesheet should discover these categories in addition
  to generating tables.  It is also assumed that the categories
  (labels) will be sorted.

  For example, information about orders (de-normalized here)
  such as:

   +-------+-----------+---------+--------+
   | Order | DayofWeek | Product | Amount |
   |-------+-----------+---------+--------+
   |   1   |  monday   | t-shirt |  5.50  |
   |   1   |  monday   | jeans   | 25.00  |
   |   2   |  monday   | boxers  |  9.50  |
   |                  ...                 |
   |   99  |  friday   | jeans   | 25.00  |
   +-------+-----------+---------+--------+

  Could produce the following summary:

   +--------+---------+-     -+--------+----------+
   |        | monday  |  ...  | friday |  total   |
   +--------+---------+-     -+--------+----------+       
   |boxers  |   9.50  |       |        |     9.50 |
   |jeans   |  50.00  |       | 100.00 |   550.00 |
     ...       ...       ...     ...      ...
   |t-shirt |  11.00  |       |  27.50 |    79.00 |    
   +--------+---------+- ... -+--------+-------- -+
   | total  | 122.25  |       | 525.75 | 1,650.25 |
   +--------+---------+-     -+--------+----------+

  Of course, day of the week could be known in advance,
  but for the benchmark we do not assume this, since it
  could very well be any other category.

  In addition, due to complexity in the order, there 
  may be several layers deep depending on how the 
  data is structured.  For instance, shift information
  sales person, comments, and other details about 
  weekly store events may also be included in the 
  source document.  Thus, much may be discarded in order
  to extract information relevant. 

  Unfortunately, this benchmark did not put in significant
  'irrelevant' data.  However, a three level hierarchy was 
  used.  Here is a snippet of the input data stream used:

  <upload>
    <day>
      <dow>tue</dow>
      <order>
        <line>
           <product>Hat</product>
           <price>11.43</price>
        </line>
    ...


3. A solution using existing XSLT constructs

  Note:  This is the best solution I could muster,
  if there are other solutions out there that are
  better this is great; let's see them.

  To produce the summary report, the stylesheet
  must first generate a list of the categories
  for both axes.  This is done with the following:
    
  <xsl:variable name="product-list" 
            select="//product[not(.=following::product)]" />

  Here the product list is made of all products which do
  not have a product entry matching it later in the document.
  Thus the first 't-shirt' is picked up, and remaining 't-shirt'
  nodes are discarded.  In XSL, the result is a node list, 
  so, one must be careful to only use information in the
  node that was used to derive uniqueness.

  Once the category lists have been identified, then 
  for-each loops can be recursively nested, with a 
  sum, filtered by the appropriate category.  See the
  appendix for more detail.   Here is a snippet:

   <xsl:for-each select="$product-list">
     <xsl:variable name="product" select="." />
        ...
          <xsl:for-each select="$day-list">
            <xsl:variable name="day" select="." />
              ...
               
              <xsl:value-of 
                select="sum(//price[../product=$product]
                                   [../../../dow=$day])" /> .
  
      
  Although I am no expert with jclark's processor, I believe
  that the entire tree is processed, as a minimum, 
  count(day-list)*count(product-list) times.   

  Here are the timing results for this stylesheet:

  Tiny   (1.6K)    6.01    6.24    6.18    6.22    6.05 [   6.14]
  Small (10.3K)   16.68   16.55   16.51   16.43   16.64 [  16.56]
  Medium (104K)  145.27  145.51  145.11  144.89  145.33 [ 145.22]
  Large  (974K) 1334.99 1335.26 1334.73 1335.03         [1335.01]

     (Linux; PPro 180; 128MB; JClark's XT)

  Although performance may be acceptable for tiny documents, in 
  this domain medium size documents (50-200K) are the norm, and 2"41"
  is unacceptable performance. Five to fifteen seconds would be more
  in-line with user expectations; not minutes.

4. "Functional Programming" and "Touch It Once" 
 
  Structure and Interpretation of Computer Programs, Abelson 
  has some wonderful comments which are relevant to XSL.

  In the first two chapters of the book Abelson teaches functional 
  programming (lack of assignment operator) and the substitution 
  evaluation model.  Then, in chapter three, he introduces assignment 
  and shows how it is "a powerful technique for maintaining a modular 
  design".  A few pages however, he goes on to warn:

   "In contrast to functional programming, programming that 
    makes extensive use of assignment is known as imperative
    programming.  In addition to raising complications about
    computational models, programs written in imperative 
    style are susceptible to bugs that cannot occur in 
    functional programs." (p234)
      
  This quote is important since the designers of XSL have been 
  striving to keep the language functional.  That is, they wish
  to prevent:

   "expressions which will produce different values depending 
    upon the order in which the subexpressions in a combination 
    are evaluated" (p238)

  To follow this goal, a template (function) can create its own 
  local variables.  In addition, the template can read variables 
  and parameters declared globally, but cannot write to them.  
  Furthermore, a template can pass parameters to another 
  template called directly. And finally, it is possible to specify 
  parameters for templates invoked for a child element match.  Thus,
  the designers have achieved their goal of a completely functional 
  language; which is a fantastic quality.  In fact, the designers 
  of XSLT have designated a exquisite recursive template matching
  system for XML stream processing!

  Unfortunately, XSL's communication system is one way; parameters 
  can be passed to templates, but no return is admitted.  This is, 
  of course, exactly the opposite of what a summary report must 
  do -- aggregate information upward.  Due to this current limit,  
  XSL's ability to process summary reports is severely hampered.  
  The currently prescribed approach is to by-pass the beautifully 
  designed recursive template mechanism and to iterate through the 
  in-memory document object using the for-each construct.  Hardly a
  divine situation.  

  In fact, the bias towards this approach is now apparent in the 
  popular tutorials.  Seldom seen is a recursive template; often 
  shown is a brute force multi-pass for-each example.  The affects 
  of this broken communication mechanism in the XSL standard cannot  
  be under-stressed; every implementation now uses an in-memory 
  document object model (DOM) or like mechanism for intermediate 
  storage.  This increases memory usage and CPU cycles -- dramatically 
  slowing down processing and putting increasing hardware constraints 
  for its deployment.  This is completely absurd, since there is no
  reason why most summary reports should require multiple passes.  To 
  quote Grandmother Weatherfax:  

       "Touch it once!  Or don't touch it at all."
  
  Realistically, this is a hard and there are few possibilities.  
  We could introduce an assignment operation, but Abelson 
  clearly warns us:

    "this power extracts a price:  the loss of referential 
     transparency, giving rise to a thicket of questions about
     sameness and change, and the need to abandon the substitution
     model of evaluation in favor of the more intricate environment
     model [...] by introducing assignment we are forced to admit
     _time_ [synchronization, mutex, and other devices] into our 
     computational models. (p297)"

  Perhaps the answer can be found, as Abelson suggests, with 
  a stream processing model which "lets us model systems that have 
  state without ever using assignment or mutable data." (p317)  
  Abelson presents a particularly interesting example where a bank 
  account object is re-written as a functional stream (p355).  Here 
  the state is external, making the approach functional.  Clearly 
  this approach is at the heart of XSL; where the goal of each 
  processor is to be stateless.   

  However, all excitement aside, as more elaborate usage of stream 
  processing is put to practice, multiple streams must be introduced, 
  and thus a stream merge mechanism is required.   Abelson concludes
  the chapter saying:

    "Thus, in an attempt to support the functional style, the need
    to merge inputs from different agents reintroduces the same 
    problems that the functional style was meant to eliminate." (p357)

    "We can model the world as a collection of separate, time-bound,
    interacting objects with state, or we can model the world as a 
    single, timeless stateless unity.  Each view has powerful 
    advantages, but neither view alone is completely satisfactory" (p357)

5. A Possible Solution:  Child to Parent named pipes.

  So if we want to "touch it once", Abelson has left us with two
  choices:  (a) Introduce a mutable object or (b) introduce multiple
  data streams and deal with the merge problem.  I chose the former 
  since it is well understood.  But perhaps it would be fun to talk 
  about the multiple-stream case just for kicks.
 
  Suppose each parent could open one or more named pipe(s), which
  in effect is the communication channel from the children back
  to the parent element:

    <xsl:template match="\" >
      <xsl:apply-templates>
        <xsl:open-pipe name="amount-by-category" />
      </xsl:apply-templates>

      ...  I'll come back here later  ...
 
    </xsl:template>
  
  Then, each of the "child" templates could write to this pipe:

    <xsl:template match="order" >
      <xsl:write-pipe name="amount-by-category" >
        <xsl:attribute name="category"  select="product" />
        <xsl:attribute name="amount"    select="price*quantity" />
      </xsl:write-pipe>
      <table>
        <tr><td>Order#</td><td><xsl:value-of select="order-number"
/></td></tr>
        <tr><td>Product</td><td><xsl:value-of select="product"
/></td></tr>
        <tr><td>Color</td><td><xsl:value-of select="color" /></td></td>
           ... pretty print rest of order ...
      </table> 
    </xsl:template>
       
  In essence, the write-pipe directive tells the XSL processor 
  what fragment needs to be stored in memory for later use; 
  allowing all of the remaining information (such as the color 
  and order-number) to be discarded.

  This would, for instance, put into temporary internal storage 
  an internal DOM object or event stream which could then be 
  accessed by the root template.

  <amount-by-category>
    <fragment category="X" amount="2.38" />
    <fragment category="X" amount="2.62" />
    <fragment category="Y" amount="3.43" />
       ...
    <fragment category="X" amount="2.50" />
  </amount-by-category>


  Which could then be accessed by the root template:

    <xsl:template match="\" >
      <xsl:apply-templates>
        <xsl:open-pipe name="amount-by-category" />
      </xsl:apply-templates>

      <xsl:for-each select="$amount-by-category" />
        <xsl:sort select="category" />
        <xsl:if test="category=previous-sibling::category" > , </xsl:test>
        <xsl:if test="not(category=previous-sibling::category">
          <xsl:value-of select="category" /> :
        </xsl:if>
        <xsl:value-of select="amount" />
      <xsl:for-each>
 
    </xsl:template>

  Which would produce:  

    X: 2.38, 2.62, ... , 2.50
    Y: 3.43, ...
    ...

  Most of the time, however, only a summary is required; so keeping
  all of the detail information in memory would be painful.  So, perhaps
  another syntax could be used to convert the "pipe" into an
"accumulator":


    <xsl:template match="\" >
      <xsl:apply-templates>
        <xsl:open-pipe name="amount-by-category" >
          <xsl:accumulate using="sum" select="amount" />
        </xsl:open-pipe>
      </xsl:apply-templates>

      <xsl:for-each select="$amount-by-category" />
        <xsl:sort select="category" />
        sum(
            <xsl:value-of select="category" /> 
        ) equals 
            <xsl:value-of select="amount" />
      <xsl:for-each>
    </xsl:template>  

  Which would produce:
    
     sum(X) = 7.50
     sum(Y) = ... 
     ...
    

  In the multi-dimentional cases, this gets to be very powerful,
  yet lean on memory and processing time.  And, if you allow the
  pipe to be passed as a non-mutable variable; things get even
  more fun:


    <xsl:template match="\" >
      <xsl:apply-templates>
        <xsl:open-pipe name="amount-by-category" >
          <xsl:accumulate using="sum" select="amount" />
        </xsl:open-pipe>
      </xsl:apply-templates>

      <xsl:call-template name="print-amount-by-category >
        <xsl:with-param   name="amount-by-category 
                        select="$amount-by-category" />
        <xsl:open-pipe name="total-amount" >
          <xsl:accumulate using="sum" select="amount" />
        </xsl:open-pipe>
      </xsl:call-template>
      
      Grand Total:  <xsl:value-of select="$total-amount" >
    </xsl:template>

    <xsl:template name="print-amount-by-category" >
      <xsl:param name="amount-by-category"/>
      <xsl:write-pipe name="total-amount">
         <xsl:attribute name="amount"    select="amount" />
      </xsl:write-pipe>
      <xsl:for-each select="$amount-by-category" />
        <xsl:sort select="category" />
        sum(
            <xsl:value-of select="category" /> 
        ) equals 
            <xsl:value-of select="amount" />
      <xsl:for-each>
    </xsl:template>  
   
  Which would produce:
    
     sum(X) = 7.50
     sum(Y) = ... 
     ...
     
     Grand Total: ...

  The obvious advantage here is that summaries of sums are far less
  expensive to calculate when compared to a grand total over every
  amount in the incoming stream.  Thus, this is would be far faster
  than, say:  <value-of select="sum(\\amount)" />   

  Of course, there are also a few "minor" problems, namely how
  to merge multiple streams as pointed out by Abelson.  I'm not
  sure if this is a show stopper.  Thoughts?

6. A practical suggestion: Aggregate hash-tables

  The way I solved the problem, introducing a mutable object
  (namely java.util.Hashtable) is far less elegant than 
  the method described above.  However, with 10K file, generating
  a summary report this way turned out be significantly faster
  than the current approach.

  The solution was put in place by building off of jclark's XT
  processor.  A static Java class was created and is accessed via XSL 
  using the ht name-space.  Most Hashtable member functions have been
  wrapped with a static versions.  Since multiple hash tables are 
  needed to store different types of information, the first argument
  to each function is the name of the hashtable.  Internally, it is
  actually a Hashtable of Hashtables.

  Here is an example of putting items into the hash table:

    <xsl:template match="day" >
      <xsl:variable name="day" select="dow" />
      <xsl:value-of select="ht:put('general','day',string($day))" />
      <xsl:value-of select="ht:put('day',string($day),'')"/>
      <xsl:apply-templates select="order" />
    </xsl:template>

  Here, each day that is found is put into the 'day' table,
  a list of days encountered.

  In this example, I also set the 'day' key in the 'general' table 
  to reflect the current day.  I do this so that a child template
  can know which day it is processing.  Yes, one could use "../../day" 
  to get this information at the child level, but my goal was to prove
  the hashtable for general parameter passing.  One could also use
  the with-param mechanism; however, this only seems to work for
  immediate descendants of the current template.  

    <xsl:template match="line" >
      <xsl:variable name="day" select="ht:get('general','day')" /> 
      <xsl:variable name="product" select="product" />
      <xsl:variable name="price"   select="price"  />
      <xsl:value-of 
         select="ht:put('product',string($product),'')"/>
      <xsl:value-of 
        select="ht:add('price',concat($day,$product),number($price))"/>
    </xsl:template>

   In the above fragment, the ht:get is used to retrieve the current
   day which was "put" by the day template.  Next, the product and price
   from the current node set are moved into variables; this is done
   so that they can be used in expressions later.  Following this,
   the current product is put into the 'product' list which contains
   all product names encountered (the hash table key is unique).
   For efficiency, a ht:add(table,key,value) is implemented, it is
   identical to:  ht:put(table,key,value+ht:get(table,key)).
   The last instruction uses ht:add to store the price using a 
   concatenated key of day and product name.

   To make the hash table inter-operable with XSLT, a to-node-list
   function is provided.   This copies the contents of the Hashtable
   into a node list of keys.  Unfortunately, I was not able to 
   return a node list of key/value pairs.

   Thus, the following fragment:

     <xsl:template match="/" >

        <xsl:apply-templates/>

        <xsl:for-each select="ht:to-node-list('day')" >
          <xsl:value-of select="."/>
        </xsl:for-each>

        ...

   First runs all templates -- filling in the 'day', 'product', 
   and 'price' hash tables.  Then, the list of keys from the
   'day' hash table is retrieved using the to-node-list function.   

     <xsl:template match="/">
        
       ...
 
       <xsl:for-each select="ht:to-node-list('product')" >
         <xsl:variable name="product" select="." />
          <xsl:for-each select="ht:to-node-list('day')" >
             <xsl:variable name="day" select="." />

            <xsl:variable name="price" 
              select="ht:get('price',concat($day,$product))" />

            <xsl:value-of select="$price" /> 

            <xsl:value-of 
              select="ht:add('day',string($day),number($price))"/>

            <xsl:value-of 
              select="ht:add('product',string($product),number($price))"/>

          </xsl:for-each>
        <xsl:value-of select="ht:get('product',string($product))" />
      </xsl:for-each>
  
      ...

   In the above fragment, the 'product' and 'day' lists are used 
   in a nested iteration.  Note how the concatenated day/product
   is used to retrieve the sum(price) for that combination.
   Once retrieved, the sum is printed and while within the loop, a 
   sum for each day and product is generated using ht:add.  

     <xsl:template match="/" >
      
       ...
 
       <xsl:for-each select="ht:to-node-list('day')" >
         <xsl:variable name="day" select="." />
         <xsl:variable name="price" select="ht:get('day',string($day))" />

         <xsl:value-of select="$price" />

         <xsl:value-of select="ht:add('general','total',number($price))"/>
          
       </xsl:for-each>
       
       <xsl:value-of select="ht:get('general','total')" />
    
     </xsl:template>
          
   In this last fragment, the template once again iterates through
   the list of days.  This is used to generate the last line of the
   table (summaries by day).   In this case, once the price is obtained, 
   it is used to generate the sum for the grand total.

   ...

   Performance measurements:
 
   tiny 1.6K :  6% faster

   (traditional)  6.01 6.24 6.18 6.22 6.05 [6.14]
   (/w hash sum)  5.83 5.77 5.71 5.68 5.85 [5.76]

   small 10.3K : 2.44 times faster.

   (traditional) 16.68 16.55 16.51 16.43 16.64 [16.56]
   (/w hash sum)  6.69  6.72  6.79  6.83  6.83 [ 6.77]

   medium 104K = 10.56 times faster
   
   (traditional) 145.27 145.51 145.11 144.89 145.33 [145.22]
   (/w hash sum)  13.79   13.78 13.80  13.73  13.69 [ 13.75]

   large 974K = 17.53 times faster
   (traditional) 1334.99 1335.26 1334.73 1335.03          [1335.01]
   (/w hash sum)   77.9    77.25   77.52   78.50   77.48  [  77.53]

7. CONCLUSION

   By augmenting current functional approach using a mutable object like 
   a hash table,  measurable performance increase is realized.  For a 
   medium size file (100K) of financial information, the processor can 
   generate results in thirteen (13) seconds as opposed to two minutes 
   and forty one seconds.   

   I believe that it is possible to get response time far below five (5) 
   seconds if the processor does not generate an in-memory node tree 
   before moving on to processing.   Thus, the performance increase 
   from moving from an object based to to an event driven based 
   processor is very measurable and well worth researching.

   One might complain that the syntax of the hash-table system is too
   complicated, however, this could be made much better /w integration 
   into the standard.  Although the traditional template approach
   seems easier once xpath is internalized.  It is also possible that 
   an "optimizing" XSLT processor could figure out how to compute partial 
   sums effectively; I would love to see the problem attacked this way.
   
   Overall, I'd love to see child to parent communication.  With
   a technique like this, a hash-table approach seems archaic.
   The goal, of course, is to eliminate the iterative "procedural"
   approach to stylesheet authoring, and instead develop on a
   recursive "event-driven" model.   This could bring several 
   benefits, among them:

    1.  We increase modularity.  With recursive applications of this
        each type of child may need to summarize differently.  Thus,
        we can have many "listeners" to the event stream doing different
        things and combinable in different ways.  The alternative 
        approach (in practice) is to use large nested <for-each loops.

    2.  The XSL processor has greatly reduced memory requirements since
        a document object model of the incoming stream does not have
        to be generated.  Practically, a processor would have to check
        to see if any "wandering xpath" constructs were in use; or
        see if any for-each loops cause folding.  Kay, could SAXON 
        perhaps be an XSLTS, S for "stream" and "speed" ?

  Anyway, this is intended to be a departure point for further 
  discussion, not a 'canned' solution.


8. Credits and References

  A.  Big thanks to the XML and XSL community.  It is not very often
      to find such an open community willing to discuss ideas.  
      Especially members of the xml-dev and xsl-list servers.

  B.  To Harold Abelson and Gerald Sussman, for writing Structure and
      Interpretation of Computer Programs.  Most excellent book packed
      with great ideas.  I have much more to learn from it.  Also in
      this must-read category is the Dragon Book by Aho, Sethi and Ullman.

  C.  Thanks to Dan Palanza, a builder in Cape Cod which has driven
      this book home to me; especially the parallels he draws from
      this book to bookkeeping,  building contract work, and the ICHING.

  D.  To James Clark for his wonderful XT, to David Megginson for SAX,
      to Kay Mitchael for SAXON, and to David Carlisle for his help.



Appendix A. Sample source data set.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<upload>
  <day>
    <dow>tue</dow>
    <order>
      <line>
         <product>T</product>
         <price>11.96</price>
      </line>
      <line>
         <product>Pants</product>
         <price>12.13</price>
      </line>
    </order>
    <order>
      <line>
         <product>Underwear</product>
         <price>13.47</price>
      </line>
    </order>
  </day>
  <day>
    <dow>fri</dow>
    <order>
      <line>
         <product>Tie</product>
         <price>1.91</price>
      </line>
      <line>
         <product>Pants</product>
         <price>14.0</price>
      </line>
    </order>
    <order>
      <line>
         <product>T</product>
         <price>9.50</price>
      </line>
    </order>
    <order>
      <line>
         <product>Coat</product>
         <price>4.65</price>
      </line>
    </order>
    <order>
      <line>
         <product>Shirt</product>
         <price>11.20</price>
      </line>
    </order>
  </day>
  <day>
    <dow>tue</dow>
    <order>
      <line>
         <product>Boots</product>
         <price>4.24</price>
      </line>
    </order>
    <order>
      <line>
         <product>Overalls</product>
         <price>11.89</price>
      </line>
    </order>
  </day>
</upload>

Appendix B. Sample report (has been edited for clarity)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
<html xmlns="http://www.w3.org/TR/REC-html40";>
<body>
<table>
  <tr><td>         </td><th> fri   </th><th> tue   </th>
</tr>
  <tr><td>Boots    </td><td>       </td><td>  4.24 </td><td>  4.24
</td></tr>
  <tr><td>Coat     </td><td>  4.65 </td><td>       </td><td>  4.65
</td></tr>
  <tr><td>Overalls </td><td>       </td><td> 11.89 </td><td> 11.89
</td></tr>
  <tr><td>Pants    </td><td> 14.00 </td><td> 12.13 </td><td> 26.13
</td></tr>
  <tr><td>Shirt    </td><td> 11.20 </td><td>       </td><td> 11.20
</td></tr>
  <tr><td>T        </td><td>  9.50 </td><td> 11.96 </td><td> 21.46
</td></tr>
  <tr><td>Tie      </td><td>  1.91 </td><td>       </td><td>  1.91
</td></tr>
  <tr><td>Underwear</td><td>       </td><td> 13.47 </td><td> 13.47
</td></tr>
  <tr><td>         </td><td> 41.26 </td><td> 53.69 </td><td> 94.95
</td></tr>
</table>
</body>
</html>

Appendix C. Generator for source data set
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

import java.io.*;
import java.util.*;

public class MakeSampleData {
    /*                               TINY  SMALL  LARGE
  private static final int DAYS    = 10     10    50
  private static final int ORDERS  = 5      15    250
  private static final int PRODUCT = 2      15    100
    */

  private static final int DAYS    = 2;
  private static final int ORDERS  = 10;
  private static final int PRODUCT = 15;

  private static Random gen = new Random();

  public static void main (String [] args )
  {
    Vector prod = new Vector();
    prod.addElement("Shirt");    prod.addElement("Hat");
    prod.addElement("Pants");    prod.addElement("Sweater");
    prod.addElement("Overalls"); prod.addElement("Underwear");
    prod.addElement("Scarf");    prod.addElement("Jeans");
    prod.addElement("Belt");     prod.addElement("Cuffs");
    prod.addElement("Tie");      prod.addElement("T");
    prod.addElement("Coat");     prod.addElement("Umbrella");
    prod.addElement("Jacket");   prod.addElement("Boots");

    Vector days  = new Vector();
    days.addElement("mon"); days.addElement("tue");
days.addElement("wed");
    days.addElement("thu"); days.addElement("fri");
days.addElement("sat");

    System.out.println("<upload>");
    for(int day = 1; day < DAYS; day++ )
    {
      System.out.println("  <day>");
      System.out.println("    <dow>" 
                       + days.elementAt( day % 6 ) + "</dow>" );
      for(int order = 1; order < (1+ rand(ORDERS)); order++ )
      {
        System.out.println("    <order>");
        for(int product = 1; product < (1+ rand(PRODUCT)); product++ )
	{
          System.out.println("      <line>");
          System.out.println("         <product>" 
                             + prod.elementAt(rand(prod.size())) 
                             + "</product>");
          System.out.println("         <price>" 
                           + (1+rand(14))+ "." + rand(100) + "</price>");
          System.out.println("      </line>");
        }
        System.out.println("    </order>");
      }
      System.out.println("  </day>");
    }
    System.out.println("</upload>");
  }

  private static int rand(int mod)
  {
    int val =  (gen.nextInt() % mod) ;
    if ( val < 0 )
      return -val;
    return val;
  }
}


Appendix D.  Stylesheet for existing constructs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/XSL/Transform/1.0";
  xmlns="http://www.w3.org/TR/REC-html40";
  result-ns="">

   <xsl:variable name="day-list" 
     select="//dow[not(.=following::dow)]" />

   <xsl:variable name="product-list" 
            select="//product[not(.=following::product)]" />
     
<xsl:template match="/">
  <html>
    <body>
    <table>
      <tr>
        <td><xsl:text> </xsl:text></td>
        <xsl:for-each select="$day-list">
          <xsl:sort order="ascending" select="." />
          <th><xsl:value-of select="."/></th>
        </xsl:for-each>
      </tr>
      <xsl:for-each select="$product-list">
        <xsl:sort    order="ascending" select="." />
        <xsl:variable name="product" select="." />
        <tr>
          <td>
            <xsl:value-of select="$product" />
          </td>
          <xsl:for-each select="$day-list">
            <xsl:sort order="ascending" select="." />
            <xsl:variable name="day" select="." />
            <td>
              <xsl:value-of 
             select="sum(//price[../product=$product][../../../dow=$day])"
/> .
            </td>
          </xsl:for-each>
          <td>
            <xsl:value-of 
              select="sum(//price[../product=$product])" /> .
           </td>
        </tr>
      </xsl:for-each>
      <tr>
        <td><xsl:text> </xsl:text></td>
        <xsl:for-each select="$day-list">
          <xsl:sort    order="ascending" select="." />
          <xsl:variable name="day" select="." />
          <td>
            <xsl:value-of 
              select="sum(//price[../../../dow=$day])" />
          </td>
        </xsl:for-each>
        <td>
          <xsl:value-of select="sum(//price)" />
        </td>
      </tr>
    </table>
    </body>
  </html>
</xsl:template>
</xsl:stylesheet>


E. Implementation of new construct for jclark's XT.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
package com.clarkevans.xtk;

import java.io.*;
import java.util.Hashtable;
import java.util.Enumeration;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import com.jclark.xsl.om.*;
import com.jclark.xsl.sax.*;


public class HashtableAdapter {
  
  private static Hashtable table_of_tables = new Hashtable();
  private static Hashtable     cache_table     = null;
  private static String        cache_key       = null;
  private static Exception     last_error      = null;

  private static Hashtable getTable( String name )
  {
    if(name != cache_key)
    { 
      cache_key = name;
      cache_table = (Hashtable) table_of_tables.get(name);
      if( null == cache_table )
      {
	cache_table = new Hashtable();
        table_of_tables.put(name,cache_table);
      }
    }
    return cache_table;
  }
  public static void put( String table, String key, String value )
  {
     getTable(table).put(key,value);        
  }
  public static void add( String table, String key, double value )
  {
    Hashtable hash = getTable(table);
    Object    obj  = hash.get(key);
    double    val  = value;
    if( Double.isNaN(val) ) val = 0;
    try
    {
      if( obj instanceof String )
        val += Double.parseDouble((String)obj); 
      else
        val += ((Double)obj).doubleValue();
    }
    catch (Exception e) { }
    hash.put(key,new Double(val));
  }
  public static void concat(
         String table, String key, String value,
         String sep, boolean unique )
  { 
    String curr = get(table,key);
    if(curr == null)
    {
      put(table,key,value);
    }
    else
    {
      if(unique)
	if( curr.indexOf(value) >  -1 )
          return;
      put(table,key,curr+sep+value);
    }   
  }
  public static String remove( String table, String key)
  {
    return (String) getTable(table).remove(key);
  }
  public static String get( String table, String key )
  {
    return get(getTable(table),key);
  }
  private static String get( Hashtable tab, String key )
  {
    Object obj = tab.get(key);
    if( obj instanceof Double )
      return obj.toString();
    else
      return (String) obj;
  }
  public static int size(String table)
  {
    return getTable(table).size();
  }
  public static String toString(String table)
  {
    String     list = new String();
    Hashtable   tab = getTable(table);
    for (Enumeration e = tab.keys() ; e.hasMoreElements() ;) {
       String key   = (String) e.nextElement(); 
       String value = get(tab,key);
       list += wrap("entry",wrap("key",key)+wrap("value",value));
    }
    return wrap("table",list);
  }
  public static String wrap(String element, String content)
  {
    return "<" + element + ">"  + content + "</" + element + ">";
  }
  public static NodeIterator toNodeList(String table)
  {
    try
    {
      XMLProcessorImpl proc = new XMLProcessorImpl(
                              new HashtableParser(
                                getTable(table)));
      Node root = proc.load( new InputSource(),
                           0, new LoadContextImpl(),
                              new NameTableImpl() );
      return root.getChildren();
    }
    catch (Exception e)
    {
      last_error = e;
      return null;
    }
  }

  public static String getLastError()
  {
    if(null != last_error) {
      StringWriter w = new StringWriter();
      last_error.printStackTrace(new PrintWriter(w));
      last_error = null;
      return w.toString();
    }
    return "";
  }   

  private static class LoadContextImpl implements LoadContext 
  {
      public boolean getStripSource(Name elementTypeName) { return false;
}
      public boolean getIncludeComments() { return true; }
      public boolean getIncludeProcessingInstructions() { return true; }
  }

  private static class HashtableParser implements Parser
  {
    Hashtable       table;
    DocumentHandler handler;
    public void setLocale (java.util.Locale locale) throws SAXException {}
    public void setEntityResolver (EntityResolver resolver) {}
    public void setDTDHandler (DTDHandler handler) {}
    public void setErrorHandler (ErrorHandler handler) {}
    public void setDocumentHandler (DocumentHandler handler) { 
      this.handler = handler; 
    }
    public void parse (InputSource source) throws SAXException {
      parse(); 
    }
    public void parse (String systemId) throws SAXException {
      parse();
    }
    HashtableParser(Hashtable table)
    {
      this.table = table;
    }
    private void parse()
    {
      try
      {
        handler.startDocument();
        for (Enumeration e = table.keys() ; e.hasMoreElements() ;) {
          AttributeListImpl al  = new AttributeListImpl();
          String key   = (String) e.nextElement();
          String value = get(table,key);
          al.addAttribute("key","CDATA",key);
          al.addAttribute("value","CDATA", value);
          handler.startElement("entry",al);
          handler.characters( key.toCharArray(),0,key.length());
          handler.endElement("entry");
        }
        handler.endDocument();
      }
      catch( Exception e )
      {
        table.put("exception",e.toString());
      }
    }
  }
}


F. Stylesheet using new construct
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/XSL/Transform/1.0";
  xmlns="http://www.w3.org/TR/REC-html40";

xmlns:ht="http://www.jclark.com/xt/java/com.clarkevans.xtk.HashtableAdapter";
  result-ns="">
     

<xsl:template match="day" >
  <xsl:variable name="day" select="dow" />
  <xsl:value-of select="ht:put('general','day',string($day))" />
  <xsl:value-of select="ht:put('day',string($day),'')"/>
  <xsl:apply-templates select="order" />
</xsl:template>

<xsl:template match="line" >
  <xsl:variable name="day" select="ht:get('general','day')" /> 
  <xsl:variable name="product" select="product" />
  <xsl:variable name="price"   select="price"  />
  <xsl:value-of
select="ht:add('price',concat($day,$product),number($price))"/>
  <xsl:value-of select="ht:put('product',string($product),'')"/>
</xsl:template>

<xsl:template match="/">
  <html>
    <body>
    <xsl:apply-templates />
    <table>
      <tr>
        <td><xsl:text> </xsl:text></td>
        <xsl:for-each select="ht:to-node-list('day')" >
          <xsl:sort    order="ascending" select="." />
          <th><xsl:value-of select="."/></th>
        </xsl:for-each>
      </tr>
      <xsl:for-each select="ht:to-node-list('product')" >
        <xsl:sort    order="ascending" select="." />
        <xsl:variable name="product" select="." />
        <tr>
          <td>
            <xsl:value-of select="$product" />
          </td>
          <xsl:for-each select="ht:to-node-list('day')" >
            <xsl:sort    order="ascending" select="." />
            <xsl:variable name="day" select="." />
            <xsl:variable name="price" 
              select="ht:get('price',concat($day,$product))" />
            <xsl:value-of 
              select="ht:add('day',string($day),number($price))"/>
            <xsl:value-of 
              select="ht:add('product',string($product),number($price))"/>
            <td>
                <xsl:value-of select="$price" /> .
            </td>
          </xsl:for-each>
          <td>
            <xsl:value-of select="ht:get('product',string($product))" />
           </td>
        </tr>
      </xsl:for-each>
      <tr>
        <td><xsl:text> </xsl:text></td>
        <xsl:for-each select="ht:to-node-list('day')" >
          <xsl:sort    order="ascending" select="." />
          <xsl:variable name="day" select="." />
          <xsl:variable name="price" select="ht:get('day',string($day))"
/>
          <xsl:value-of 
            select="ht:add('general','total',number($price))"/>
          <td>
            <xsl:value-of select="$price" /> .
          </td>
        </xsl:for-each>
        <td>
          <xsl:value-of select="ht:get('general','total')" />
        </td>
      </tr>
    </table>
    </body>
  </html>
</xsl:template>

</xsl:stylesheet>



















 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread