RE: [xsl] RE: (Keys on multiple element types)

Subject: RE: [xsl] RE: (Keys on multiple element types)
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Tue, 5 Feb 2002 11:46:51 -0000
> You are correct in that I am trying to get to grips with keys,
> but I didnt appreciate that they automatically removed duplicates
> based on certain conditions, i.e if two say <project> nodes were
> the same.

Keys by themselves don't remove duplicates, but the code you were using,
which selects an item only if it is the first one with that key, does remove
duplicates.
>
> I am going to be processing gigabytes of xml and will
> obviously need to split
> the files into smaller, say 10-15 Mb chunks. I was using
> Xalan which was
> taking
> about 10 hours to process a 20Mb file on my machine (900Mhz
> 256Meg ram)
> When I switched to Instant Saxon (the easiest install in the
> world!) it
> finished in 1 hour 20 minutes flat!, presumably because Saxon
> streams the
> data.

Xalan was probably thrashing. Saxon is a bit more economical with memory,
and it looks as if in your case this made a big difference by reducing
paging traffic.
>
> We are moving to a new server, with twin-processors and a gig
> of RAM. For
> sheer processing speed for the amount of data we have, which
> version of
> Saxon, and VM would you recommend under which platform?
>
Use full Saxon, version 6.5, with JDK 1.3 (I use Sun's Java VM but IBM's is
also highly regarded).

>
> P.S I would appreciate your views on the site
> www.datapower.com/XSLTMark/

Well, it hasn't been updated for a while, and some processors may have
improved a lot in the meantime, but it's still the best comparison
available. One criticism I have made of it is that it doesn't separate
tree-building time from transformation time, and some of its measurements
include tree-building in the cost and others don't. For simple
transformations tree-building can take longer than the actual
transformation. Another criticism is that it doesn't measure memory usage,
which is particular important for large documents like yours: in fact Saxon
makes some space/time trade-offs which help to give the kind of result you
quoted above, but have no impact on this benchmark because the files are too
small.
>
> What is XT19991105. Is it really 'better' than Saxon or does
> all of this
> depend on exactly what processing your doing.

This is the xt processor from James Clark. It is certainly fast. In my own
tests, I found Saxon and xt to be about the same speed; but it's notorious
that developers can always get better performance out of their own code than
they can from anyone else's. The main drawback of xt is that it is
unfinished. For example it doesn't support keys - which are the most
important tool for boosting performance -, it doesn't support the JAXP API,
etc, and it is no longer being developed.

There will always be some things that one processor does better than another
(for example I found Saxon was vastly faster than xt doing format-number),
and this can of course affect the results of benchmarks.
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread