RE: [xsl] positional grouping xslt2

Subject: RE: [xsl] positional grouping xslt2
From: David.Pawson@xxxxxxxxxxx
Date: Wed, 10 Mar 2004 09:15:11 -0000
Long post.

David Carlisle said,
    This should get you started

and it did! Thanks David. Much appreciated.

Its far messier than I thought, and I'm still not sure its wholly right.
Couple of questions and the solution I have to date.

The initial source is a twiki text,
as per http://twiki.org/

Sort of looks like[0]:

SGML tools convert it to xml, as shown below [1] (James Clark sp)
then David C's stylesheet [2] adds sufficient structure
to make it valid docbook, [4]

Questions.

1. The example (MK I guess?) shows 

 <xsl:for-each-group select="*" group-starting-with="head1"> 
  <xsl:choose> 
  <xsl:when test="self::head1"> 
  <section> 
  <title><xsl:value-of select="self::head1"/></title>

You've used
 <xsl:for-each-group select="*" group-starting-with="head1"> 
  <xsl:choose> 
  <xsl:when test="self::head1"> 
  <section> 
  <title><xsl:copy-of select="self::head1/node()"/></title>
Why different please David?

2. Just a note which I missed when I tried it.
 <xsl:for-each-group select="current-group()[position()&gt;1]"
    group-starting-with="head2">
is used to further group by name the following stuff, note how
the first item in the list is skipped, since its the current group key
thingy.

3. In order to process block level content which is not flat, I've added

<xsl:apply-templates select="current-group()[self::p]" mode="struct"/> 

since the p element is the only block level element other than
the two lists (see the dtd after the wiki instance)
  1. I'm not sure this is right, but I can't figure out how else
to process such content, e.g. p or verbatim elements.


Anyway, I've more than a good start.

Again, thanks David.


[0] Input.
<!DOCTYPE article SYSTEM "twikisgm.dtd">
<article>
---+ An Example Twiki Document
----
a para to show content in this level 1

---+ This is a level 1 heading
A filler paragraph

---++ This is a level 2 heading
A filler paragraph
---+++ And a level 3 heading. Never more than two linefeeds between content.
Othewise it shows up as a paragraph!

A straightforward paragraph requires no special formatting.  It can
include *bold* text, _italic_ text and =monotype= text.  Note that
bold and italic will be converted to docbook emphasis markup.

Note that the star and underscore must be tight up to the marked up text,
as per the wiki requirment

---++ Other markup.

For bulleted list, which are turned into itemized lists, use 

   * Three spaces (not a tab) then the *star*
   * Other List items then follow
   * Third list item

For ordered (itemised) lists,

   1 ITem one
   1 item two
   1 and so on
   1Note also that you only need add the number 1, not increment it each
time.
   1And that if you insert spaces after the number, they will be copied
over.



[0a] This is valid to the following dtd

<!ENTITY % doctype "article">

<!ENTITY % blocks 	" p | head1 | head2 | head3 | bull1  | litem |  bq
|verbatim " >
<!ENTITY % inlines 	" emphasis |i|b  | tt | literal|nop " >

<!ELEMENT article O O ( keyword |%blocks; | pre | dl | hr )* >

<!ELEMENT p       O O  ( #PCDATA | %inlines; | link  )* >
<!ELEMENT head1   O O  ( #PCDATA | %inlines; | link  )* >
<!ELEMENT head2   O O  ( #PCDATA | %inlines; | link  )* >
<!ELEMENT head3   O O  ( #PCDATA | %inlines; | link  )* >
<!ELEMENT bull1   O O  ( #PCDATA | %inlines; | link  )* >
<!ELEMENT litem   O O  ( #PCDATA | %inlines; | link  )* >


<!ELEMENT text    o o (#PCDATA)>
<!ELEMENT ref     o o (#PCDATA)>
<!ELEMENT emphasis o o (#PCDATA)>
<!ELEMENT b       o o (#PCDATA)>
<!ELEMENT i       o o (#PCDATA)>

<!ELEMENT verbatim  - - (#PCDATA | p)*>

<!ELEMENT bi      o o (#PCDATA)>
<!ELEMENT tt      o o (#PCDATA)>
<!ELEMENT literal o o (#PCDATA)>
<!ELEMENT nop - o (EMPTY)>

<!ELEMENT dl 	O O ( dt, dd? )+>
<!ELEMENT dt      O O  ( #PCDATA | %inlines; | link  )* >
<!ELEMENT dd      O O  ( #PCDATA | %inlines; | link  )* >


<!ELEMENT hr 	O O EMPTY >
<!ELEMENT link 	O O (ref, text ) >
<!ELEMENT keyword O O (#PCDATA) >











[1] XML produced by sp
<?xml version="1.0" encoding="utf-8"?>
<article>
   <head1> An Example Twiki Document</head1>
   <hr/>
   <p>a para to show content in this level 1</p>
   <head1> This is a level 1 heading</head1>
   <p>A filler paragraph</p>
   <head2> This is a level 2 heading</head2>
   <p>A filler paragraph</p>
   <head3> And a level 3 heading. Never more than two linefeeds between
content. Othewise it shows up as a paragraph!</head3>
   <p>A straightforward paragraph requires no special formatting.  It
can</p>
   <p>include <b>bold</b> text, <i>italic</i> text and
<literal>monotype</literal> text.  Note that</p>
   <p>bold and italic will be converted to docbook emphasis markup.</p>
   <p>Note that the star and underscore must be tight up to the marked up
text,</p>
   <p>as per the wiki requirment</p>
   <head2> Other markup.</head2>
   <p>For bulleted list, which are turned into itemized lists, use </p>
   <bull1> Three spaces (not a tab) then the <b>star</b>
   </bull1>
   <bull1> Other List items then follow</bull1>
   <bull1> Third list item</bull1>
   <p>For ordered (itemised) lists,</p>
   <litem> ITem one</litem>
   <litem> item two</litem>
   <litem> and so on</litem>
   <litem>Note also that you only need add the number 1, not increment it
each time.</litem>
   <litem>And that if you insert spaces after the number, they will be
copied over.</litem>
   <p>Display lists, or definition lists, contain a term and its</p>
   <p>explanation.  Three spaces followed by the dollar symbol prefix
the</p>
   <p>term, the colon terminates the term and starts the definition. The</p>
   <p>newline terminates the definition</p>
   <dl>
      <dt>term</dt>
      <dd>definition</dd>
   </dl>
   <p>Unfortunately, I can't find a way of making the dollar symbol a
part</p>
   <p>of the markup. Yet.  Meanwhile, use TABterm:definition and it</p>
   <p>works. Quick sed should sort that out until I find a way to hack</p>
   <p>it. Trouble is, there aren't many people left who grok all this
stuff.</p>
   <p>Links work in a rather strange way. <link>
         <ref>http://example.com</ref>
         <text>hottext</text>
      </link>
   </p>
   <p>The first term is the link target, the second is the hot text to go
inside</p>
   <p>the link. Forced links and internal links are not fully supported.
<link>
         <ref>http://www.example.com</ref>
      </link>
   </p>
   <p>may resolve to a link without hot text content. A syntax error is
normally reported.</p>
   <p>The xslt stylesheet attempts to resolve this to useable docbook</p>
   <p>Verbatim content is as shown below. Redundant internal paragraph</p>
   <p>markup will be removed by the xslt stylesheet. .  </p>
   <head2> Escaping special markup</head2>
   <p>to get bold text within a list,</p>
   <bull1> list with <b>bold</b> contained is no problem,</bull1>
   <bull1> Nested lists, using 6 or 9 spaces don't work, sorry</bull1>
   <bull1> to include a star within a list, without it being seen as bold,
use #x02A; which is the normal numerical character entity value. Same, using
value hex 5F for #x5F; underscore</bull1>
   <p>Also note that unless explicitly marked up, all intervening
whitespace</p>
   <p>is just for the human readers benefit, i.e. its all eaten by the
process</p>
   <p/>
   <verbatim>
      <p>Unchanged text</p>
      <p/>
   </verbatim>
   <p>Final paragraph</p>
</article>




[2] Stylesheet

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE xsl:stylesheet [
  <!ENTITY  LEGALNOTICE " ">

]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                version="2.0" >
  
<d:doc xmlns:d="rnib.org.uk/tbs#">
 <revhistory>
   <purpose><para>This stylesheet works with the output of the twiki
conversion script, using  XXXX.xml to produce XXXX.db.xml, valid to the
docbook DTD</para></purpose>
   <revision>
    <revnumber>1.0</revnumber>
    <date>Mar 9 2004</date>
    <authorinitials>DaveP</authorinitials>
    <revdescription>
     <para>Initial issue</para>
    </revdescription>
    <revremark></revremark>
   </revision>
  </revhistory>
  </d:doc>
  

  <xsl:output method="xml" indent="yes" encoding="utf-8"
/>

  <xsl:template match="/">
    <xsl:text disable-output-escaping="yes">
	<![CDATA[

<!DOCTYPE article
  PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd";
[
<!ENTITY RH "Red Hat"> <!--The generic term "Red Hat" -->
<!ENTITY FORMAL-RHI "Red Hat, Inc."> <!--The generic term "Red Hat, Inc. -->
<!ENTITY PROJECT "Fedora project"> <!-- Set the project name -->
<!ENTITY NAME-TITLE "Fedora Project"> <!-- Set the project name, use for
titles -->
<!ENTITY DISTRO "Fedora Core"> <!-- Set the distro name -->
<!ENTITY BOOKID "example-tutorial-0.1 (2003-07-07)"> <!-- change version of
manual and date here -->

<!ENTITY LEGALNOTICE SYSTEM "../common/legalnotice-en.xml">


]>
]]>
</xsl:text>


<article id="example-tutorial" lang="en">
  <articleinfo>
    <title>Example Tutorial</title>
    <copyright>
      <year>2003</year>
      <holder>Red Hat, Inc.</holder>
      <holder>Tammy Fox</holder>
    </copyright>
    <authorgroup>
      <author>
	<surname>Fox</surname>
	<firstname>Tammy</firstname>
      </author>
    </authorgroup>
    &LEGALNOTICE;
  </articleinfo>

  <xsl:apply-templates />
      </article>
  </xsl:template>

<xsl:template match="article">

  <xsl:for-each-group select="*" group-starting-with="head1"> 
  <xsl:choose> 
  <xsl:when test="self::head1"> 
  <section> 
  <title><xsl:copy-of select="self::head1/node()"/></title>
  <xsl:for-each-group select="current-group()[position()&gt;1]"
    group-starting-with="head2">
    <xsl:choose>
      <xsl:when test="self::head2">
        <section>
          <title><xsl:copy-of select="self::head2/node()"/></title>
          <xsl:for-each-group select="current-group()[position()&gt;1]"
            group-adjacent="name()">
            <xsl:choose>
              <xsl:when test="self::bull1">
                <itemizedlist mark="bullet">
                  <xsl:apply-templates select="current-group()[self::bull1]"
mode="struct"/>
                </itemizedlist>
              </xsl:when> 
              <xsl:when test="self::litem"> 
              <orderedlist>
                <xsl:apply-templates select="current-group()[self::litem]"
mode="struct"/>
              </orderedlist>
            </xsl:when> 
            <xsl:otherwise> 
            <xsl:apply-templates select="current-group()[self::p]"
mode="struct"/> 
          </xsl:otherwise> 
        </xsl:choose> 
      </xsl:for-each-group> 
    </section> 
  </xsl:when> 
  <xsl:otherwise> 
<xsl:apply-templates select="current-group()[self::p]" mode="struct"/> 
</xsl:otherwise> 
</xsl:choose> 
</xsl:for-each-group> 
</section> 
</xsl:when> 
<xsl:otherwise> 
<xsl:apply-templates select="current-group()[self::p]" mode="struct"/> 
</xsl:otherwise> 
</xsl:choose> 
</xsl:for-each-group> 

</xsl:template>



  
  <xsl:template match="i" mode="struct">
    <emphasis role="italics">
      <xsl:apply-templates/>
    </emphasis>
  </xsl:template>

  <xsl:template match="b"  mode="struct">
    <emphasis role="bold">
      <xsl:apply-templates/>
    </emphasis>
  </xsl:template>

  <xsl:template match="bull1|litem"  mode="struct">
    <listitem>
      <para>
      <xsl:apply-templates mode="struct"/>
    </para>
    </listitem>
  </xsl:template>

 
  <xsl:template match="hr" mode="struct"/>

  <xsl:template match="literal"  mode="struct">
    <literal>
      <xsl:apply-templates  mode="struct"/>
    </literal>
  </xsl:template>

  <xsl:preserve-space elements="verbatim"/>

  <xsl:template match="verbatim"  mode="struct">
    <programlisting >
      <xsl:value-of select="*"/>
    </programlisting>
  </xsl:template>



<xsl:template match="p" mode="struct">
  <para><xsl:apply-templates  mode="struct"/></para>
</xsl:template>




<xsl:template match="*"/>

</xsl:stylesheet>


[3] REsulting output, valid docbook!

<?xml version="1.0" encoding="utf-8"?>
	

<!DOCTYPE article
  PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd";
[
<!ENTITY RH "Red Hat"> <!--The generic term "Red Hat" -->
<!ENTITY FORMAL-RHI "Red Hat, Inc."> <!--The generic term "Red Hat, Inc. -->
<!ENTITY PROJECT "Fedora project"> <!-- Set the project name -->
<!ENTITY NAME-TITLE "Fedora Project"> <!-- Set the project name, use for
titles -->
<!ENTITY DISTRO "Fedora Core"> <!-- Set the distro name -->
<!ENTITY BOOKID "example-tutorial-0.1 (2003-07-07)"> <!-- change version of
manual and date here -->

<!ENTITY LEGALNOTICE SYSTEM "../common/legalnotice-en.xml">


]>

<article id="example-tutorial" lang="en">
   <articleinfo>
      <title>Example Tutorial</title>
      <copyright>
         <year>2003</year>
         <holder>Red Hat, Inc.</holder>
         <holder>Tammy Fox</holder>
      </copyright>
      <authorgroup>
         <author>
            <surname>Fox</surname>
            <firstname>Tammy</firstname>
         </author>
      </authorgroup>
   </articleinfo>
   <section>
      <title> An Example Twiki Document</title>
      <para>a para to show content in this level 1</para>
   </section>
   <section>
      <title> This is a level 1 heading</title>
      <para>A filler paragraph</para>
      <section>
         <title> This is a level 2 heading</title>
         <para>A filler paragraph</para>
         <para>A straightforward paragraph requires no special formatting.
It can</para>
         <para>include <emphasis role="bold">bold</emphasis> text, <emphasis
role="italics">italic</emphasis> text and <literal>monotype</literal> text.
Note that</para>
         <para>bold and italic will be converted to docbook emphasis
markup.</para>
         <para>Note that the star and underscore must be tight up to the
marked up text,</para>
         <para>as per the wiki requirment</para>
      </section>
      <section>
         <title> Other markup.</title>
         <para>For bulleted list, which are turned into itemized lists, use
</para>
         <itemizedlist mark="bullet">
            <listitem>
               <para> Three spaces (not a tab) then the <emphasis
role="bold">star</emphasis>
   
               </para>
            </listitem>
            <listitem>
               <para> Other List items then follow</para>
            </listitem>
            <listitem>
               <para> Third list item</para>
            </listitem>
         </itemizedlist>
         <para>For ordered (itemised) lists,</para>
         <orderedlist>
            <listitem>
               <para> ITem one</para>
            </listitem>
            <listitem>
               <para> item two</para>
            </listitem>
            <listitem>
               <para> and so on</para>
            </listitem>
            <listitem>
               <para>Note also that you only need add the number 1, not
increment it each time.</para>
            </listitem>
            <listitem>
               <para>And that if you insert spaces after the number, they
will be copied over.</para>
            </listitem>
         </orderedlist>
         <para>Display lists, or definition lists, contain a term and
its</para>
         <para>explanation.  Three spaces followed by the dollar symbol
prefix the</para>
         <para>term, the colon terminates the term and starts the
definition. The</para>
         <para>newline terminates the definition</para>
         <para>Unfortunately, I can't find a way of making the dollar symbol
a part</para>
         <para>of the markup. Yet.  Meanwhile, use TABterm:definition and
it</para>
         <para>works. Quick sed should sort that out until I find a way to
hack</para>
         <para>it. Trouble is, there aren't many people left who grok all
this stuff.</para>
         <para>Links work in a rather strange way. 
         http://example.com
         hottext
      
   </para>
         <para>The first term is the link target, the second is the hot text
to go inside</para>
         <para>the link. Forced links and internal links are not fully
supported. 
         http://www.example.com
      
   </para>
         <para>may resolve to a link without hot text content. A syntax
error is normally reported.</para>
         <para>The xslt stylesheet attempts to resolve this to useable
docbook</para>
         <para>Verbatim content is as shown below. Redundant internal
paragraph</para>
         <para>markup will be removed by the xslt stylesheet. .  </para>
      </section>
      <section>
         <title> Escaping special markup</title>
         <para>to get bold text within a list,</para>
         <itemizedlist mark="bullet">
            <listitem>
               <para> list with <emphasis role="bold">bold</emphasis>
contained is no problem,</para>
            </listitem>
            <listitem>
               <para> Nested lists, using 6 or 9 spaces don't work,
sorry</para>
            </listitem>
            <listitem>
               <para> to include a star within a list, without it being seen
as bold, use #x02A; which is the normal numerical character entity value.
Same, using value hex 5F for #x5F; underscore</para>
            </listitem>
         </itemizedlist>
         <para>Also note that unless explicitly marked up, all intervening
whitespace</para>
         <para>is just for the human readers benefit, i.e. its all eaten by
the process</para>
         <para/>
         <para>Final paragraph</para>
      </section>
   </section>
</article>

- 
DISCLAIMER: 

NOTICE: The information contained in this email and any attachments is 
confidential and may be privileged. If you are not the intended 
recipient you should not use, disclose, distribute or copy any of the 
content of it or of any attachment; you are requested to notify the 
sender immediately of your receipt of the email and then to delete it 
and any attachments from your system. 

RNIB endeavours to ensure that emails and any attachments generated by 
its staff are free from viruses or other contaminants. However, it 
cannot accept any responsibility for any  such which are transmitted.
We therefore recommend you scan all attachments. 

Please note that the statements and views expressed in this email and 
any attachments are those of the author and do not necessarily represent 
those of RNIB. 

RNIB Registered Charity Number: 226227 

Website: http://www.rnib.org.uk 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread