Re: [xsl] Converting milestone tags

Subject: Re: [xsl] Converting milestone tags
From: Вячеслав Седов <schematronic@xxxxxxxxx>
Date: Thu, 14 Oct 2010 13:15:52 +0400
good case for grouping

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
    xmlns:xs="http://www.w3.org/2001/XMLSchema";
    xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl";
    exclude-result-prefixes="xs xd"
    version="2.0">
    <xd:doc scope="stylesheet">
        <xd:desc>
            <xd:p><xd:b>Created on:</xd:b> Oct 14, 2010</xd:p>
            <xd:p><xd:b>Author:</xd:b> vsedov</xd:p>
            <xd:p></xd:p>
        </xd:desc>
    </xd:doc>
    <xsl:template match="*">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="*[span]">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:for-each-group select="node()"
                group-by="count(self::span[@order eq 'start']) +
count(preceding-sibling::span[@order = ('start', 'end')])">
            <xsl:choose>
                <xsl:when test="current-group()/self::span">
                    <span><xsl:apply-templates
select="current-group()"/></span>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:apply-templates select="current-group()"/>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:for-each-group></xsl:copy>
    </xsl:template>
    <xsl:template match="span[@order = ('start', 'end')]"/>
</xsl:stylesheet>

Vyacheslav Sedov
Schematronic

2010/10/14 Michael Kay <mike@xxxxxxxxxxxx>
>
>  This class of problems is quite tricky. The most general approach is to
flatten the first hierarchy, so everything is reduced to milestones, and then
use positional grouping to construct the new hierarchy from the flat
structure.
>
> If you have access to a good library, try looking for Michael Jackson's
1970s books on Jackson Structured Programming, where he tackles this class of
problem under the heading of "boundary conflict". The vocabulary is different
- it's all about sequential processing of hierarchic files on magnetic tape -
but the logic is the same, and it's the most systematic treatment I've seen.
Essentially he shows that if the hierarchic structure of the input and output
are in some sense congruent, then a single tree walk over the input can handle
the problem, but if they aren't then you can devise a new intermediate
hierarchy - perhaps very flat - that is congruent with both the input and the
output, so one tree walk will get you from the input to the intermediate tree,
and a second tree walk will get you from the intermediate tree to the output.
(This is assuming of course that you don't have an ordering conflict, which is
true in your case).
>
> Your example doesn't need the full generality of this approach, because the
start/end milestones are always siblings and are always matched in the same
paragraph, but your discussion indicates that you might want to tackle things
that go beyond this example.
>
> Michael Kay
> Saxonica
>
> On 14/10/2010 8:05 AM, Josef Schneeberger wrote:
>>
>> Hi everybody,
>>
>> I am new to this list and apologize, if my question is an FAQ. I scanned
>> the archives, but did not find a solution. The question arises in a TEI
>> project where we have to switch from a chapter hierarchy to a page
>> oriented form. The XSLT is done in multiple steps (a cocoon pipeline)
>> and I use Saxon9.
>>
>> Here is a simplified example of an infile:
>>
>> <root>
>>  <p>text<span order="start"/>text<span order="end"/>  text</p>
>>  <p>text<span order="start"/>text<span order="end"/>  text
>>     text<span order="start"/>text<span order="end"/>  text</p>
>>  <p>text text text<span order="start"/>text<span order="end"/></p>
>>  <p><span order="start"/>text<span order="end"/>  text text text</p>
>> </root>
>>
>> which should result in the following output:
>>
>> <root>
>>  <p>text<span>text</span>  text</p>
>>  <p>text<span>text</span>  text
>>     text<span>text</span>  text</p>
>>  <p>text text text<span>text</span></p>
>>  <p><span>text</span>  text text text</p>
>> </root>
>>
>> There my be an arbitrary number of<span order="begin"/>  (and
>> corresponding end milestone tags) in a p element. Furthermore, any
>> "text" node may again contain markup which should be preserved in the
>> output. I tried various approaches but I failed. Here is one of my
>> attempts using sibling recursion ...
>>
>> <xsl:stylesheet version="2.0"
>>  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>>  <xsl:template match="/">
>>   <xsl:apply-templates/>
>>  </xsl:template>
>>
>>  <xsl:template match="root">
>>   <root><xsl:apply-templates/></root>
>>  </xsl:template>
>>
>>  <xsl:template match="p">
>>   <p>
>>    <xsl:apply-templates select="child::node()" mode="procp"/>
>>   </p>
>>  </xsl:template>
>>
>>  <xsl:template match="span[@order='start']" mode="procp">
>>   <span>
>>    <xsl:apply-templates
>>      select="following-sibling::node()[1][not(self::span)]"
>>      mode="procp"/>
>>   </span>
>>   <xsl:apply-templates select="following-sibling::node()[1]"/>
>>  </xsl:template>
>>
>>  <xsl:template match="node()" mode="procp">
>>   <xsl:copy-of select="."/>
>>    <xsl:apply-templates
>>       select="following-sibling::node()[1][not(self::span)]"
>>       mode="procp"/>
>>  </xsl:template>
>> </xsl:stylesheet>
>>
>> Any help would be greatly appreciated. Josef

Current Thread