Re: [xsl] RE: (Keys on multiple element types)

Subject: Re: [xsl] RE: (Keys on multiple element types)
From: "Ahmad J. Reeves" <ahmad@xxxxxxxxxxxxxx>
Date: Tue, 05 Feb 2002 10:57:48 +0000
Hi Mike,

Many thanks for your reply. I got a similar answer from Jeni 
as well.

You are correct in that I am trying to get to grips with keys, 
but I didnt appreciate that they automatically removed duplicates
based on certain conditions, i.e if two say <project> nodes were
the same.

Jeni's reply would be to use: -

<xsl:key name="rows" match="FILES/*"
         use="concat(project(), '+', name)" />

I didnt realise that keys removed duplicates automatically. I presume they
have to be absolutely identical for this to be the case. Therefore I'll go
back to have another look.

BTW, I wanted to ask you regarding processing.

I am going to be processing gigabytes of xml and will obviously need to split
the files into smaller, say 10-15 Mb chunks. I was using Xalan which was
taking
about 10 hours to process a 20Mb file on my machine (900Mhz 256Meg ram)
When I switched to Instant Saxon (the easiest install in the world!) it
finished in 1 hour 20 minutes flat!, presumably because Saxon streams the
data.

We are moving to a new server, with twin-processors and a gig of RAM. For
sheer processing speed for the amount of data we have, which version of
Saxon, and VM would you recommend under which platform?

Many thanks again Mike

Ahmad.

P.S I would appreciate your views on the site www.datapower.com/XSLTMark/ 

What is XT19991105. Is it really 'better' than Saxon or does all of this
depend on exactly what processing your doing.


At 09:27 AM 2/5/02 -0000, you wrote:
>>
>> I'm trying to get to grips with the syntax of keys across
>> children with different names....
>>
>
>> I'm trying to display the customer name once with a list of
>> projects. I can
>> get this to work if all of the child nodes of <FILES> have
>> the same name,
>> e.g. <RECORD>. If however I change them to three different
>> names, I need
>> to provide alternatives to the key select as below:-
>>
>> <?xml version="1.0"?>
>> <xsl:stylesheet version="1.1"
>> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>>
>> <xsl:output method="text"/>
>>
>> <xsl:key name="rows" match="RECORDA | RECORDB | RECORDC" use="name"/>
>
>Fine so far.
>>
>> <xsl:template match="FILES">
>> <xsl:apply-templates
>> select="RECORDA[generate-id(.)=generate-id(key('rows',
>> name)[1])]"/>
>> </xsl:template>
>
>You're processing every RECORDA that is the first (RECORDA|RECORDB|RECORDC)
>with that name.
>>
>> This outputs:-
>>
>> Fred
>> Project 1
>> Project 2
>>
>> But no Harry. So its as if it doesn't include RECORDC in the key list.
>>
>> q1. Whats wrong with the stylesheet that it doesnt do this. I
>> think its :-
>> select="RECORDA[generate-id(.
>>
>> but every other path I try ends up with no output.
>
>If you only select RECORDA elements then it's only going to output RECORDA
>elements. If you want RECORDB and RECORDC as well, then use:
>
>select="(RECORDA|RECORDB|RECORDC)
>          [generate-id(.)=generate-id(key('rows',name)[1])]"
>
>or just:
>
>select="*[generate-id(.)=generate-id(key('rows',name)[1])]"
>>
>> q2. Does the vertical bar | mean 'or', and wouldnt it be
>> better to use ','
>> which I think means 'and'? When I tried ',' it threw up loads
>> of exceptions.
>
>No it doesn't mean "or", and "," doesn't mean "and". "|" means union. (Where
>are you getting your misinformation from?)
>>
>> q.3 If the xml looked like this instead...
>>
>> <FILES>
>>         <RECORDA>
>>
>> <id>13</id><name>Fred</name><project_name>Building</project_name>
>>         </RECORDA>
>>         <RECORDA>
>>
>> <id>14</id><name>Fred</name><project_name>Building</project_name>
>>         </RECORDA>
>>         <RECORDB>
>>
>> <id>15</id><name>Fred</name><project_name>Looking</project_name>
>>         </RECORDB>
>>         <RECORDC>
>>
>> <id>16</id><name>Harry</name><project_name>Writing</project_name>
>>         </RECORDC>
>> </FILES>
>>
>>
>> How could I get the result to remove multiple copies e.g. to output
>> <id>13</id> but not <id>14</id>
>>
>
>The code you are using is specifically design to remove duplicates. It looks
>to me as if you are using the Muenchian grouping technique "by rote", having
>copied it from a cookbook, but without really understanding it. Go back to
>the textbook and work through it to understand how it works.
>
>Michael Kay
>Software AG
>home: Michael.H.Kay@xxxxxxxxxxxx
>work: Michael.Kay@xxxxxxxxxxxxxx
>
>
> XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>
-------------------------------------------------
Ahmad J Reeves BSc (Hons) MSc (Dist) PhD Student
Information, Media & Communication Research Group
Department of Computer Science
Queen Mary, University of London
E1 4NS
Tel +44(0) 207 882 5257
Fax +44(0) 208 980 6533
http://www.dcs.qmw.ac.uk/imc

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread