Re: [xsl] Transforming milestone tags

Subject: Re: [xsl] Transforming milestone tags
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 14 Jul 2004 12:01:49 -0400

This is the topic of my paper this year at the Extreme conference. And others. Multiple concurrent hierarchies is a hot problem. If you contact me off-list, I can provide you with more info. Extreme is only three weeks away, and putting its Proceedings together is one of the things I'm working on when I'm not writing to this list. :->

Wednesday August 4 will be "Overlap Day" at Extreme this year: see the program at You may notice that no less than four of Wednesday's abstracts start with the same string (is this a case of overlap?): "Overlap in markup occurs where some markup structures do not nest...".

The short version of the story is that this is most easily done by handling the markup quite differently from the way XSLT expects to. It can be done with XSLT fairly simply (that's what my paper is on), but it's highly unorthodox. In your case, a simple approach would be to process the input in two passes, one to flatten all the markup into milestones, the next to write the flat stuff out again with the hierarchy you want.

But no guarantee even of well-formedness can be made about the output, using current tools, which is one reason why this is an interesting research area. We'd like to get to that point, but this will require implementing LMNL ( or something similar.

Your data looks like near-TEI. The TEI folks (who, like the OSIS project, have to deal with overlap more than a little) are watching this space. :->


At 03:45 AM 7/14/2004, you wrote:
I have a source document which uses a hierarchical to markup the structure of the text of a manuscript (<div> for the big divisions and <p> for the paragraphs) and milestone tags for page breaks (<pb>) and line breaks (<lb>), which may occur in virtually any place inside the hierarchy, for example:

  <pb n="1" />
    <p>Line A
    <lb/>Line B
    <pb n="2" />
    <lb/>Line C
    <p>Line D
    <lb/>Line E
    <lb/>Line F
    <pb n="3" />
    <p>Line G
    <lb/>Line H
    <lb/>Line I
    <p>Line J
    <lb/>Line K
    <lb/>Line L

I would like to transform this document into a nested structure of <page> and <line> tags and markup the textual divisions as milestones:

  <page n="1">
    <line n="1.1">Line A</line>
    <line n="1.2">Line B</line>
  <page n="2">
    <line n="2.1">Line C</line>
    <line n="2.2">Line D</line>
    <line n="2.3">Line E</line>
    <line n="2.4">Line F</line>
  <page n="3">
    <line n="3.1">Line G</line>
    <line n="3.2">Line H</line>
    <line n="3.3">Line I</line>
    <line n="3.4">Line J</line>
    <line n="3.5">Line K</line>
    <line n="3.6">Line L</line>

What is the best strategy to do this? (My main problem is to get a selection of nodes spanning between <pb> tags appearing on different levels in the hierarchy.)

Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.      
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

Current Thread