Re: [xsl] segmenting a paragraph

Subject: Re: [xsl] segmenting a paragraph
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 02 Oct 2007 10:34:59 +0200
At 2007-10-02 17:05 +0900, Christian Wittern wrote:
In trying to solve the following problem I am seeking your help:
I want to segment paragraphs in a text, so that sentences are enclosed in a <s> element and within the sentences, words between interpunction are within <seg> elements.


So far, I have been capturing the content of <p> in a string and then using two nested <xsl:analyze-string> blocks with regexes, which work nicely and do what I want. Now I discovered that there are <note> elements with additional markup in some paragraphs, which get lost in this process. However, I really want to leave these notes alone, as they are. So:

<p>Some text. Some more text, with a comma. <note>This stuff, how boring</note></p>

should look like:

<p><s><seg>Some text.</seg></s><s><seg>Some more text,</seg><seg> with a comma.</seg></s><note>This stuff, how boring</note></p>

I wonder how I tell the processor to leave the note stuff alone?

From your comment "capturing the content in a string and then..." I'm assuming you have something like:


  <xsl:template match="p">
    <xsl:analyze-string select="." .....
  </xsl:template>

If you break this into pieces you can work on each text bit in turn:

  <xsl:template match="p">
    <xsl:apply-templates mode="in-p" select="node()"/>
  </xsl:template>
  <xsl:template mode="in-p" match="*">
    <xsl:apply-templates select="."/> <!--reapply in the default mode-->
  </xsl:template>
  <xsl:template mode="in-p" match="text()">
    <xsl:analyze-string select="." .....


I hope this helps.


. . . . . . . . . . . . Ken

--
Upcoming public training: UBL and code lists Oct 1/5; Madrid Spain
World-wide corporate, govt. & user group XML, XSL and UBL training
RSS feeds:     publicly-available developer resources and training
G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Jul'07  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

Current Thread