RE: [xsl] [XSL] extracting a verse

Subject: RE: [xsl] [XSL] extracting a verse
From: "McNally, David" <David.McNally@xxxxxxxxxx>
Date: Wed, 18 Dec 2002 15:45:39 -0500
If you just want to get the text within verses, then there is a very simple
solution with keys.  Assign keys to all text nodes - the key is the
concatenation of the ID of the verse that precedes the text node and the ID
of the verseEnd that follows it.  It's then easy to use the key to pull all
the text nodes for a given verse.

Verses.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="C:\Work\xsl\verses.xslt"?>
<text>
  <div>
    <chapter id="BCV-GEN-1" to="BCV-GEN-1-END" value="1"/>
    <head>The Story of#Creation</head>
    <p>
      <verse id="BCV-GEN-1.1" to="BCV-GEN-1.1-END" value="1"/>In the
beginning, when God created the universe,
      <verseEnd id="BCV-GEN-1.1-END" from="BCV-GEN-1.1"/>
      <verse id="BCV-GEN-1.2" to="BCV-GEN-1.2-END" value="2"/>the
earth was formless and desolate. The raging ocean that covered everything
was engulfed in total darkness, and the ......         
    </p>
    <p>rest of verse 2 
      <verseEnd id="BCV-GEN-1.2-END" from="BCV-GEN-1.2"/>
	but this is just paragraph
    </p>
    <p>Paragraph Paragraph Paragraph 
      <verse id="BCV-GEN-1.3" to="BCV-GEN-1.3-END" value="3"/>This is the
third
    </p>
    <p>verse  </p>
      <verseEnd id="BCV-GEN-1.3-END" from="BCV-GEN-1.3"/>
    <p> paragraph </p>
  </div>
</text>

verses.xslt

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:key name="verses" match="text()"
use="concat(preceding::verse[1]/@id,'::',following::verseEnd[1]/@id)"/>
  <xsl:template match="/">
  <top>
    <xsl:for-each select="//verse">
	<verse>
      <xsl:variable name="text">
        <xsl:copy-of select="key('verses',concat(@id,'::',@to))"/>
      </xsl:variable>
      <xsl:value-of select="normalize-space($text)"/>		
      </verse>
    </xsl:for-each>
  </top>
  </xsl:template>
</xsl:stylesheet>


Output:

<?xml version="1.0" encoding="UTF-8"?>
<top>
<verse>In the beginning, when God created the universe,</verse>
<verse>the earth was formless and desolate. The raging ocean that covered
everything was engulfed in total darkness, and the ...... rest of verse
2</verse>
<verse>This is the third verse</verse>
</top>


If you want to get the embedded markup, then it's a bit more difficult, and
you have to make a decision about what to do with markup that doesn't fit
properly within verses.  To make life easy I've decided to just throw away
anything that doesn't cleanly nest.  Then I apply the key to text nodes and
element nodes, and, as before, for each verse create a variable ($text) that
contains all the text and element nodes between the start and end verse
milestones.  I then apply templates to everything in that variable that
doesn't have a parent in the variable (thus avoiding repetition of the same
nodes).

Verces2.xslt.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:key name="verses" match="text() | *"
use="concat(preceding::verse[1]/@id,'::',following::verseEnd[1]/@id)"/>
  <xsl:template match="/">
  <top>
    <xsl:for-each select="//verse">
	<verse>
      <xsl:variable name="text"
select="key('verses',concat(@id,'::',@to))"/>
	      <xsl:apply-templates select="$text[not(count(parent::*|$text)
= count($text))]"/>
      </verse>
    </xsl:for-each>
  </top>
  </xsl:template>
  
 <xsl:template  match="*">
  <xsl:element name="{name(.)}">
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates />
  </xsl:element>
</xsl:template>
  
</xsl:stylesheet>

Output 2:

<top>
<verse>In the
beginning, when God created the universe,
      </verse>
<verse>the
earth was formless and desolate. The raging ocean that covered everything
was engulfed in total darkness, and the ......         
    rest of verse 2 
      </verse>
<verse>This is the third
    <p>verse  </p>
</verse>
</top>

Only one paragraph nests cleanly.

I tried out this approach on Wendell's text:

verses3.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="C:\Work\xsl\verses3.xslt"?>
<quote>
  <verse/>
  <s>
    <seg>Of Man's first disobedience,</seg>
    <seg>and the 
fruit<endVerse/>
      <verse/>Of that forbidden tree whose mortal taste <endVerse/>
      <verse/>Brought death into the World,</seg>
    <seg>and all our 
woe,</seg>
    <endVerse/>
    <verse/>
    <seg>With loss of Eden,</seg>
    <seg>till one greater Man <endVerse/>
      <verse/>Restore us,</seg>
    <seg>and regain the blissful seat,</seg>
    <endVerse/>
    <verse/>
    <seg>Sing,</seg>
    <seg>Heavenly Muse,</seg>
    <seg>that,</seg>
    <seg>on 
the secret top <endVerse/>
      <verse/>Of Oreb,</seg>
    <seg>or of Sinai,</seg>
    <seg>didst inspire <endVerse/>
      <verse/>That Shepherd who first taught the chosen seed <endVerse/>
      <verse/>In the beginning how the heavens and earth <endVerse/>
      <verse/>Rose out of Chaos:</seg>
    <seg>or,</seg>
    <seg>if Sion hill <endVerse/>
      <verse/>Delight thee more,</seg>
    <seg>and Siloa's brook that flowed <endVerse/>
      <verse/>Fast by the oracle of God,</seg>
    <seg>I thence <endVerse/>
      <verse/>Invoke thy aid to my adventurous song,</seg>
    <seg>
      <endVerse/>
      <verse/>That with no middle flight intends to soar <endVerse/>
      <verse/>Above th' Aonian mount,</seg>
    <seg>while it pursues <endVerse/>
      <verse/>Things unattempted yet in prose or rhyme.</seg>
  </s>
  <endVerse/>
  <verse/>
  <s>
    <seg>And chiefly thou,</seg>
    <seg>O Spirit,</seg>
    <seg>that dost 
prefer <endVerse/>
      <verse/>Before all temples th' upright heart and pure,</seg>
    <seg>
      <endVerse/>
      <verse/>Instruct me,</seg>
    <seg>for Thou know'st;</seg>
    <seg>Thou from the 
first <endVerse/>
      <verse/>Wast present,</seg>
    <seg>and,</seg>
    <seg>with mighty wings 
outspread, </seg>
    <seg>
      <endVerse/>
      <verse/>Dove-like sat'st brooding on the vast Abyss, </seg>
    <seg>
      <endVerse/>
      <verse/>And mad'st it pregnant:</seg>
    <seg>what in me is dark <endVerse/>
      <verse/>Illumine,</seg>
    <seg>what is low raise and 
support;</seg>
    <seg>
      <endVerse/>
      <verse/>That,</seg>
    <seg>to the height of this great 
argument,</seg>
    <seg>
      <endVerse/>
      <verse/>I may assert Eternal Providence,</seg>
    <seg>
      <endVerse/>
      <verse/>And justify the ways of God to men.</seg>
  </s>
  <endVerse/>
</quote>

verces3.xslt

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:key name="verses" match="text() | *"
use="concat(generate-id(preceding::verse[1]),'::',generate-id(following::end
Verse[1]))"/>
  <xsl:template match="/">
  <top>
    <xsl:for-each select="//verse">
	<verse>
      <xsl:variable name="text"
select="key('verses',concat(generate-id(.),'::',generate-id(following::endVe
rse[1])))"/>
	      <xsl:apply-templates select="$text[not(count(parent::*|$text)
= count($text))]"/>
      </verse>
    </xsl:for-each>
  </top>
  </xsl:template>
  
 <xsl:template  match="*">
  <xsl:element name="{name(.)}">
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates />
  </xsl:element>
</xsl:template>
  
</xsl:stylesheet>

Output 3

<?xml version="1.0" encoding="UTF-8"?>
<top>
  <verse>
    <seg>Of Man's first disobedience,</seg>and the 
fruit</verse>
  <verse>Of that forbidden tree whose mortal taste </verse>
  <verse>Brought death into the World,<seg>and all our 
woe,</seg>
  </verse>
  <verse>
    <seg>With loss of Eden,</seg>till one greater Man </verse>
  <verse>Restore us,<seg>and regain the blissful seat,</seg>
  </verse>
  <verse>
    <seg>Sing,</seg>
    <seg>Heavenly Muse,</seg>
    <seg>that,</seg>on 
the secret top </verse>
  <verse>Of Oreb,<seg>or of Sinai,</seg>didst inspire </verse>
  <verse>That Shepherd who first taught the chosen seed </verse>
  <verse>In the beginning how the heavens and earth </verse>
  <verse>Rose out of Chaos:<seg>or,</seg>if Sion hill </verse>
  <verse>Delight thee more,and Siloa's brook that flowed </verse>
  <verse>Fast by the oracle of God,I thence </verse>
  <verse>Invoke thy aid to my adventurous song,</verse>
  <verse>That with no middle flight intends to soar </verse>
  <verse>Above th' Aonian mount,while it pursues </verse>
  <verse>Things unattempted yet in prose or rhyme.</verse>
  <verse>
    <seg>And chiefly thou,</seg>
    <seg>O Spirit,</seg>that dost 
prefer </verse>
  <verse>Before all temples th' upright heart and pure,</verse>
  <verse>Instruct me,<seg>for Thou know'st;</seg>Thou from the 
first </verse>
  <verse>Wast present,<seg>and,</seg>
    <seg>with mighty wings 
outspread, </seg>
  </verse>
  <verse>Dove-like sat'st brooding on the vast Abyss, </verse>
  <verse>And mad'st it pregnant:what in me is dark </verse>
  <verse>Illumine,<seg>what is low raise and 
support;</seg>
  </verse>
  <verse>That,<seg>to the height of this great 
argument,</seg>
  </verse>
  <verse>I may assert Eternal Providence,</verse>
  <verse>And justify the ways of God to men.</verse>
</top>

This seems moderately useful, but I think it would be hard to extend this
approach to handle markup that doesn't cleanly nest... 

Hope this helps.
David.
--
David McNally            Moody's Investors Service
Software Engineer        99 Church St, NY NY 10007 
David.McNally@xxxxxxxxxx            (212) 553-7475 


---------------------------------------

The information contained in this e-mail message, and any attachment thereto, is confidential and may not be disclosed without our express permission.  If you are not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution or copying of this message, or any attachment thereto, in whole or in part, is strictly prohibited.  If you have received this message in error, please immediately notify us by telephone, fax or e-mail and delete the message and all of its attachments.  Thank you.

Every effort is made to keep our network free from viruses.  You should, however, review this e-mail message, as well as any attachment thereto, for viruses.  We take no responsibility and have no liability for any computer virus which may be transferred via this e-mail message.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread