Re: [xsl] stylesheet vs egrep

Subject: Re: [xsl] stylesheet vs egrep
From: Trevor Nash <tcn@xxxxxxxxxxxxx>
Date: Fri, 25 Jan 2002 18:37:17 +0000
On Fri, 25 Jan 2002 14:07:05 +0000, you wrote:

>Hi Trevor,
>
>First many thanks for your reply. The files I am processing
>are 20megs each by the way.
>
>I tried the stylesheet and it gave me 28,792 unsorted and
>163 sorted, which was the same as my last stylesheet and
>still not the 254 given to me by egrep. My egrep command
>
>egrep "<CHARACTER_ID> [0-9]{3,6} </CHARACTER_ID>" 1.xml |sort -u | wc -l
>
>is maybe doing something strange? Heres the first 20..
>
Obvious question: does the input contain the 'missing' numbers or not
- i.e. can you find 10010 etc?
I bet you will find that here is some white space or something which
is confusing the egrep ... though I cannot explain why the unsorted
totals should be the same.  Hang on though: if your file had
          <CHARACTER_ID> 10946 </CHARACTER_ID>
and
   <CHARACTER_ID> 10946 </CHARACTER_ID>
wouldn't the egrep version count that as 2 but the XSLT version as 1
(in the XSLT version you get only the numbers, not the other junk on
the same line).

So the 10010 isn't missing from the grep version, its getting sorted
much later - what does 'sort' use for a key, isn't it the full text of
the line?

As to the size of file: if you need to tune for performance, you will
need to do it by experiment.  Adding templates to skip nodes sounds
like an obvious improvement, but the trouble is the more templates you
have the higher the cost of processing each node - which one wins
depends on the structure of the file and what processor you use.  If
you are not doing anything else in the transform you might find 
<xsl:for-each select="//CHARACTER_ID" > works best.

Regards,
Trevor Nash
--
Traditional training & distance learning,
Consultancy by email

Melvaig Software Engineering Limited
voice:     +44 (0) 1445 771 271 
email:     tcn@xxxxxxxxxxxxx

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread