[xsl] Generating numbering for cross-references (LONG)

Subject: [xsl] Generating numbering for cross-references (LONG)
From: Peter Flynn <peter@xxxxxxxxxxx>
Date: Thu, 18 Oct 2001 02:06:57 +0100
This has foxed numerous brains, including three of the co-authors of
XSL and XPath. I hope it's not just my clumsy way of expressing it.

In a typical research document, an author needs to make many
cross-references.  XML provides ID and IDREF attributes for this
purpose, and lots of DTDs are written to make use of them. I'm having
a problem instatiating three classes of numeric cross-reference in
XSLT, namely:

a. Tables and figures, as in `See <ref to="foo"/>' where
   you have <table id="foo"> or <figure id="foo">

b. Chapters/sections/subsections/etc, where the reference
   looks the same as above but points at (eg) <section id="foo">
   and needs to generate a decimal-style reference (eg 1.1.1)

c. Reading lists (bibliographies) where the citation says
   <cite work="foo"/> and the entry says <bibentry id="foo">

In normal processing of the document text, you use <xsl:number> to
assign counters to tables, figures, chapters, sections, bib entries,
etc. At serialization, these numbers get values in the normal way,
but those values are not available or accessible to other places
in the stylesheet because the language is side-effect free. You
cannot reach over to the relevant part of the tree (by ID) and
pluck a copy of a number which has not yet been generated.

********************************************************************
At the point where a reference is made, what is the XPath expression
you need to generate the number which will be the value which will
eventually get assigned to the target?
********************************************************************

References can be scattered (randomly distributed) through the
document, so <xsl:number> cannot be used at the point of reference as
it can be at the target, as it presupposes sequence.

I have therefore been trying to devise something using the function
count(preceding-sibling) of the element with the ID and adding one to
it (so a reference to the 4th table counts 3 before it and adds 1 :-)

The problems are several:

  i) how do you combine locating the element with the right ID *and*
     counting its preceding siblings in the same XPath expression,
     when the syntax requires the count( token to precede the nodeset
     on which it is to operate, and that nodeset cannot be identified
     until the ID has been resolved, and...

 ii) ...you cannot use name() to formulate a dynamic expression as in
     count(preceding-sibling::name(ID(@to)))

iii) if this is to work on non-validating parsers, we must use an
     expression like //*[id=$thisref] instead of ID(@to), having
     previously set $thisref to the value of the "to" attribute.
     (In the DTDs in question, the ID attribute is *always* called
     "id", fortunately.)

 iv) figures and tables are non-contiguously located, so // is
     presumably needed to mean "count any figure, anywhere". This
     would change if per-chapter figure numbering were needed:
     fortunately it's not, right now.

  v) chapters, sections, etc are usually contiguous, but any
     generation of the number must mimic the feature of <xsl:number>
     in being able to perform decimal-style (1.1.1) numbering.

The following fragment represents the (non-working :-) model of what
is needed:

1. find the element with the right ID;

     2. count how many other elements of the same type
        precede it in the document order (contiguously
        or non-contiguously);

3. add one.

<xsl:template match="ref">
  <xsl:variable name="thisref">
    <xsl:value-of select="@to"/>
  </xsl:variable>
  <xsl:value-of select="count(preceding-sibling::#)[//#[id=$thisref]]+1"/>
</xsl:template>

where I have used the # in the XPath expression to stand for
"the name of the element type bearing this ID value". The syntax
is wrong, of course, but that's just because I can't see how to
do it.

In the case of chapters, sections, etc, this process would necessarily
have to occur n times, where n is the depth required to generate the
decimal numbering sequence.

It is of course possible to re-perform the same sequential operations
as are done in the normal course of assigning the numbers to the
target text itself...in other words (in the absence of a validating
parser), for each reference, perform a for-each on all relevant
elements, test each one's ID against the IDREF value, and when
found, do a count on the preceding-siblings, then add one. But the
iterative nature of doing this for a deep recursion as for chapter,
section, subsection, etc for potentially hundreds of references, would
make processing time unreasonably large.

What I find unbelievable is that no-one has yet done this. It must be
one of the most common requirements in document processing, and
without it, XSLT simply cannot be used to reproduce simple numeric
cross-references like "see Figure 16" or <plug>bibliographic
references like Kay[23]</plug> :-)

What have I missed?

///Peter


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread