Re: [xsl] Prior Instance of term in main text, before first glossary markup

Subject: Re: [xsl] Prior Instance of term in main text, before first glossary markup
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 28 Oct 2004 11:47:45 -0400
Hi David,

At 02:20 AM 10/28/2004, you wrote:
I am marking up foreign words so that I can display the Japanese/Chinese
characters in my text, but only for only the first usage of a term, and
also to have the stylesheet generate a glossary at the end. Cutting and
pasting and rewriting, it is easy to forget if I have already "glossed" a
term or not, and I cannot figure out a correct xsl test to see if the term
is properly marked up the first time it is used. That is to say, I need to
test to see if the word has been used before (in document order) in the
source tree, before the first glossary markup.

Okay. Be on notice that XSLT 1.0 isn't industrial-strength on string handling; but it should be able to manage this okay particularly if your input documents aren't huge.


It seems like it should be easy to check with a simple <xsl:if test---> but
I do not seem to be able to get it to work, and I do not see anything
relevant in the archives.

The test may not be so simple, depending on the requirements. (See above.)


If I have this source, to make the smallest example:

<p>The Buddhist way is zazen. When first practicing sitting meditation
<mygloss><gr>zazen</gr><gk>==Japanese Characters==</gk></mygloss> select a
quiet place. </p>

And my stylesheet includes:

<xsl:template match="mygloss">
<xsl:if test="contains(ancestor::text(), gr) or contains(preceding::text(), gr)">
XXX ERROR-- NOT FIRST USAGE XXX
</xsl:if>
<!----- rest of template to put in italic, proper font, check for first
gloss markup of this word etc -->
</xsl:template>


The above test does not find and output the error of the earlier first use,
i.e. when I forgot to gloss it when I first used in the beginning of the paragraph (or anywhere earlier, to make the general case).

Right. You are falling into a couple of traps here. The second is really subtle.


The first (not so subtle) is that no nodes in the tree ever have ancestor text nodes, since text nodes are always "leaf" nodes -- at the "bottom" of the tree. So the first part of the test will never test true.

The second (the subtle one) is that the contains() function takes two strings as input, and the XSLT 1.0 rule for converting a node-set to a string is to take the first node in the set, in document order. Taking the first text node from the set (preceding::text()) will always return the first text node in your document -- which is not the one you want to test.

I tried this test:

contains(ancestor::*, gr) or contains(preceding::*, gr)

But the test is always true, even if there is no prior use.

It is always true because the first element ancestor, in document order, of the matching mygloss element (i.e. the document element) always contains the gr child of the same mygloss (as well as all the others).


This following test works if the word is not further marked up in the
paragraph, but of course it fails to look in end notes and such:

contains(ancestor::p/text(), gr) or contains(preceding::p/text(), gr)

Right. It's also prone to give you false hits sometimes -- if the first text node in the document inside a p happens to contain the string.


So, in sum, how do I do this properly? And of course, a kind explanation to
get me out of my confusion about the xpath and xsl would be very much appreciated.

I think the root of the confusion is in knowing a bit about XPath node tests and axes (hint: terms to look up :-), plus the rule I cited about how contains() works and how a node set is converted to a string.


Really, it's this rule that is causing the trouble. You don't want your contains() to work on a *particular* text node, much less the first such text node in the document. Rather, you want it to work on an aggregation of *all* earlier text nodes.

You can get this -- you need it as a string -- by collecting the text nodes you want into a variable:

<xsl:variable name="preceding-text">
  <xsl:copy-of select="preceding::text()"/>
  <!-- copies all preceding text nodes into a result-tree-fragment, where
       they are concatenated into one since there are no elements to
       keep them separate -->
</xsl:variable>

and then

<xsl:if test="contains($preceding-text, gr)">
  XXX ERROR-- NOT FIRST USAGE XXX
</xsl:if>
<!----- rest  of template to put in italic, proper font, check for first
gloss markup of this word etc -->

Note that this technique of aggregating the text nodes first allows quite a bit of flexibility. Since the rule for converting a result tree fragment into a string, unlike that for a node set, is to take the text value of the entire fragment (not the text value of the first node in the fragment, since such fragments aren't transparent in XSLT 1.0), you can do things inside of that variable declaration, or in the select expression of the copy-of, to refine how that string is made. So, for example,

<xsl:variable name="preceding-text">
  <xsl:copy-of select="preceding::text()[not(ancestor::note)]"/>
</xsl:variable>

... would leave out the text nodes that had a note ancestor from the set of text nodes gathered together and then tested. (So you could mention "zazen" inside a note and not throw the error.)

I hope that explains it all sufficiently. Note that the preceding:: axis can be expensive, so on large documents things may get slow. If performance suffers, there's a trick that can be helpful: passing an aggregation of earlier text nodes down through the templates as a parameter so it doesn't have to be regathered all the time. Ask again if you need to see this.

Happy sitting!
Wendell


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread