Re: Complex table of content generation

Subject: Re: Complex table of content generation
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Sat, 01 Jul 2000 11:32:21 +0100
Jean-Claude,

>Anyway, can you give me 2 solutions for the following tree? One when the
>stopping solution is "we find a component name which was already
>processed", and one when the stopping solution is "we find a component name
>which is a component which has subcomponent". I'd like these two solutions
>(if possible, or at least the second one) to understand how you store and
>manage already parsed components.

This has proved a real challenge, although I have definitely made things
more difficult for myself than necessary by going for a single-pass
solution and deliberately avoiding any extension functions/elements.  That
means that I haven't 'stored and managed already parsed components' - I
have used implicit knowledge about what gets processed to determine whether
something has 'already been' processed or not.  (Of course it may not
actually have 'already been' processed by the XSLT processor, which may
process things in parallel, but this is an easy way of guaranteeing we only
go down the logical order once for each compnent.)

The second solution is the tricky one because it's under-specified.  Say we
have the component 'C4'.  It has subcomponents, so (by the second logic),
we should *never* output those subcomponents.  Of course, there are certain
conditions in which you *should* output them, but as far as I can gather
from your examples, they are wrapped up in the first solution: whether the
component has already been looked at.

Another side-issue for me trying to give you the output that you've
specified is that you use the dashes inconsistently in your examples: it
looks as though you use them when all the siblings at the particular level
are at the bottom of the hierarchy, but that isn't the case for the four
constructions:
>C, C4, C1.1
>C, C4, C2.1
>C, C4, C
>C, C4, C4

Why not C, C4, C1.1 - C2.1 - C - C4 ?

I'm going to assume that you were using dashes as a short-hand as this
makes things a lot easier to handle! :)

There is still nothing outwardly special about 'C' - I'm assuming that
you're using the numbering system to help identify components in this
example and that in the real case you have variously named components and
can't use things like the length of the name to identify the top component.
 So I still define a parameter giving the name of the top component:

<xsl:param name="top-component" select="'C'" />

Then two keys.  One, as before, is used to quickly get the subcomponents of
a component given its name.  Note that I've predicated the match to allow
for the fact that you have <SubComponent>none</SubComponent> in the example
you've given, rather than just not having any SubComponent children of
Components without subcomponents:

<xsl:key name="super-sub" match="SubComponent[. != 'none']" use="../@Name" />

The second key is used to group the defined subcomponents by their name.
This is used to help identify whether a particular SubComponent is the
first mention within the file, taking advantage of the fact that the key
groups the SubComponent nodes in document order:

<xsl:key name="subcomponents" match="SubComponent" use="." />

Now the templates.  It's inefficient with to work top-down with a problem
like this because you don't know at the start how many rows there are going
to be, so I'm working bottom-up - trying to identify which subcomponents
appear as the last item in one of your logical orders.  The ones at the
bottom are those that
- are not called 'none' and are either:
  - called the same as the top component
  - don't have subcomponents
  - not the first mention of the component

The first conditions are easy - testing on the content of the SubComponent
element to see if it's either the literal string 'none' or the value of the
global parameter $top-component.

The second condition involves counting how many SubComponent children there
are of the Component element that bares the same @Name as the content of
the SubComponent we're looking at.  The 'super-sub' key was built to make
this easy - if the length of the nodes returned when you index that key on
the content of the SubComponent is zero, then the Component has no
SubComponent children (of significance), so not(key('super-sub', .)).

The third condition involves comparing the SubComponent we're currently
looking at with the first SubComponent that is retrieved from the group
that we've constructed using the 'subcomponents' key.  If they're not the
same, then the SubComponent isn't the first mention of that SubComponent,
in which case it is at the bottom of the hierarchy as it's 'already been'
processed.

The full template, then is:

<xsl:template match="Components">
  <xsl:apply-templates select="Component/SubComponent[
    . != 'none' and
    (. = $top-component or
     not(key('super-sub', .)) or
     generate-id(.) != generate-id(key('subcomponents', .)[1]))]" />
</xsl:template>
 
The second template is a recursive template operating on the SubComponents
that were generated in the first template, above.  Remember we're starting
at the bottom of the logical order: this template steps up the logical
order, calling itself on the first SubComponent that has the same content
as the @Name of the containing Component:

<xsl:template match="SubComponent">
  <xsl:choose>
    <!-- if the container is not the top component -->
    <xsl:when test="../@Name != $top-component">
      <!-- apply templates to the first SubComponent retrieved using the
           'subcomponents' key that has the same content as the @Name of
           the containing Component -->
      <xsl:apply-templates select="key('subcomponents', ../@Name)[1]"
/><xsl:text>, </xsl:text>
    </xsl:when>
    <!-- if the container *is* the top component, put in a line break,
         the name of the top component, and a comma -->
    <xsl:otherwise><xsl:text>
    </xsl:text><xsl:value-of select="$top-component" />, </xsl:otherwise>
  </xsl:choose>
  <!-- add the value of the current SubComponent -->
  <xsl:value-of select="." />
</xsl:template>

These templates give basically the results you want on the sample file that
you gave when tested in SAXON, but with no groupings of results - the
ordering of the output is determined by the document order of the
SubComponents in the input.  I don't know if this matters or not: it might
be possible to sort them in another way using xsl:sort within the
xsl:apply-templates in the Component-matching template.

>>>2.2 Description of each component (each name is a hyperlink, too)
>
>Excellent solution again. But you may have the same result as me. Sometimes
>(not necessary here) results are like A,B,B,C,C,C,D,D and we want just
>A,B,C,D. Is it possible to delete the identical elements during the
>processing? I suppose that a simpler solution is to write a second
>stylesheet which does this deletion, but how to delete the identical
>elements inside the process of your stylesheet?

I assume you mean in situations where there might be repeating
subcomponents of a component?  In that case, I think you can filter those
out (using the solution I gave in my last email) if you redefine the
$children varaible as:

<xsl:variable
  name="children"
  select="SubComponent[not(preceding-sibling::SubComponent(. = current())]" />

In other words, only look at the child SubComponents that do not have a
preceding sibling that has the same content as they do.  I haven't tested
this.

>>I have used xsl:for-each in these examples both for brevity and because the
>>definition of the output you wanted from the logical order seemed to be in
>>terms of 'depth', which is easier to manage with xsl:for-each than
>>xsl:apply-templates.  However, if you can define the stopping condition
>>(i.e. when do you stop going down a branch?) in terms of things that you
>>know about a particular component (e.g. has it got a longer name than its
>>parent?), then you could (should?) probably use xsl:apply-templates instead.
>
>I didn't understand this part. Can you explain me ? May be your new
>solution will apply this part ?

Sorry that this was confusing.  If you know how many levels you're going
down, then you know that you can use nested xsl:for-eaches - each
xsl:for-each goes down one level.  If you have a fixed depth, then using
xsl:apply-templates is hard because you have to pass down information about
how far down you've gone each time you apply templates.  If you *don't*
know how many levels there are, then you can't use that approach, you have
to use xsl:apply-templates and do it recursively, because you can't
guarantee that you will nest your xsl:for-eaches deep enough for every
possible depth of the hierarchy.

So, if you know how deep you want to go, then use xsl:for-each; if you
don't, use xsl:apply-templates.  In the first example, I knew, and used
xsl:for-each.  Above, I don't know, and use xsl:apply-templates.

When you use xsl:apply-templates recursively (as above), you have to have a
stopping condition at which point you know you've reached the bottom (or
top) of the processing, otherwise it just goes on forever.  At each point,
the only thing you know about is the particular node that you're currently
processing - you can't know anything about what has gone before unless you
pass in information (which can get really tedious).  So stopping conditions
have to involve looking at what you know about a particular node (a
component in your case).

Anyway, thanks for the challenge - I hope this solution works better for you.

Cheers,

Jeni

Dr Jeni Tennison
Epistemics Ltd * Strelley Hall * Nottingham * NG8 6PE
tel: 0115 906 1301 * fax: 0115 906 1304 * email: jeni.tennison@xxxxxxxxxxxxxxxx


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread