Re: [xsl] Pattern Matching in XSl - find groups defined in one Xml in another Xml.

Subject: Re: [xsl] Pattern Matching in XSl - find groups defined in one Xml in another Xml.
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 23 Aug 2012 12:56:23 -0400
Dear Richard,

On 8/23/2012 5:56 AM, Kerry, Richard wrote:
So it seems that the use of the key statement and function gives
functionality the same as can be got by using a predicate in a
certain way. It is presumably advised in some cases as an opportunity
for the processor to optimize the search. In my case the data is
probably not large enough for this optimization to make much
difference to the processing time.

<xsl:apply-templates select="key('p-by-id','p2')"/>

applies a template to the p2 'p' element,

So the same as<xsl:apply-templates select="p[ @id = 'p2' ]"/>

Almost. It is the same as //p[@id='p2']


(Your expression is short for child::p[attribute::id = 'p2'] which gets only children of the context node, not all p elements.)

You are correct that its main advantage is that it provides the processor a chance to optimize retrieval.

However, it's also worth observing that some processors (notably Saxon) will also optimize the brute-force top-down traversal -- it will notice the pattern //node[expr] and index for it -- so you may see no difference in processing time at all.

At that point, the only real advantages of the key are that it provides another opportunity to expose the logic in the code (through good naming), as well as making it more performant on engines that need it.

Also, as you say, if the data size is not large, then it may be unnecessary.

<xsl:apply-templates select="key('p-by-id','np1')"/>

applies a template to the np1 'p' element.

So the same as<xsl:apply-templates select="p[ @id = 'np1' ]"/>

//p[@id='np1']


Note that the key retrieval is independent of its processing context.

key('element-by-id',@rid)

retrieves nodes using the 'element-by-id' key by the value of the
@rid attribute of the context node.

something like "element[ @id = ./@rid ]" (assuming<xsl:key name="element-by-id" match="element" use="@id"/> and that "element" is not a reserved word. )

//element[@id=current()/@rid]


assuming a declaration like this:

<xsl:key name="element-by-id" match="element" use="@id"/>

but if the key declaration has this

<xsl:key name="element-by-id" match="*" use="../identifier"/>

then you would get

//*[../identifier=current()/@rid]

It does sound like keys implicitly work using "=", though.  So
there's no key based equivalent to "element[ @id>  ./@rid ]" ?

Not exactly, although given certain specific problems one might be able to contrive something:


<xsl:key name="element-gt-id" match="*" use="(1 to 100)[. lt xs:integer(current()/@id)]"/>

This indexes every element to a sequence of integers up to 100, all less than the element's @id. Consequently, when you give an integer to the key, as in

key('element-gt-id',5)

you'd get back all the elements (anything matching "*") whose @id was less than 5. (As long as every @id casts to an integer, or you'll get a runtime error.)

This is a fairly academic case, however, and I'm not sure why one would do it.

Except that one can, maybe:

<xsl:key name="greater-element" match="*"
  use="(min(//@id/xs:integer(.)) to max(//@id/xs:integer(.)))
       [. lt xs:integer(current()/@id)]"/>

indexes each element to all the integers between the least and greatest values on @id attributes in the document -- as long as they are all castable to integers -- that are less than the element's own @id.

Having said all that, as you say, my requirement for RE matching does
seem to rule out use of keys.

Yes, and you'll notice that Ken dropped the key in his second solution.


I was going to ask (for academic interest really) what would happen
if a key had duplicate matches ? (error at indexing time ?  return
all the matching elements ?  return the first ?) You do say:
* Keys can (and frequently do) retrieve more than one node at a
time;
So I presume a sequence of all the matches would be returned in that
case. Likewise if the requested key was a sequence a sequence would
be returned assuming all distinct keys produced a separate match.

Quite so, and easy enough to demonstrate in testing.


The classical use case for keys is in resolving cross-references. Find the figure to which this xref element points; or find all the xref elements pointing to this figure.

Another canonical use is in removing duplicates from a set (as in the Muenchian grouping technique), since key('key',$val)[1] returns only a single node among those that have $val as a key value. So you can iterate $val and get a single member for each value.

But we generally don't have to do this much in XSLT 2.0, where there are better approaches for this.

In 2.0 keys are even more powerful than in 1.0, since the rules of their declaration have been relaxed; their retrieval can be scoped when they are called; and they can be chained.

key('xref-by-rid',@id)/key('fig-for-xref',@rid)

might retrieve all the figure elements referred to by xref elements that referred to the context node by its @id. (Some data sets have one-to-many xrefs, where xref/@rid can reference several elements at once.)

Cheers,
Wendell

--
======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread