Re: [xsl] Splitting a paragraph into sentences and keep markup

Subject: Re: [xsl] Splitting a paragraph into sentences and keep markup
From: "Rick Quatro rick@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 24 Nov 2019 17:50:43 -0000
Hi David,

Yes, there shouldn't be any cross-paragraph elements.

Rick

-----Original Message-----
From: David Carlisle d.p.carlisle@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Sunday, November 24, 2019 9:33 AM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Splitting a paragraph into sentences and keep markup

can we assume the easy case (as in your example) where all the sentences end
at the top level?

a more challenging example is

<root>
    <p>This has one <span class="zzz">sentence? Actually, it has
<emphasis>two</emphasis>.  No,</span> it has three.</p> </root>

as then you need to force-close any open elements at the sentence end and
re-open them in the new sentence so something like

  <p>This has one <span class="zzz">sentence?</span></p>
  <p><span class="zzz">Actually, it has <emphasis>two</emphasis>.</span></p>
 <p><span class="zzz">No,</span> it has three.</p>

David

On Sun, 24 Nov 2019 at 13:34, Rick Quatro rick@xxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi All,
>
>
>
> I have a situation where I want to split a short paragraph into sentences
and use them in different parts of my output. I am using <xsl:analyze-string>
because I want to account for a sentence ending with a . or ?. This will work
except if there are any children of the paragaph, like the <emphasis> in the
second sentence. Can I split a paragraph into sentences and still keep the
markup?
>
>
>
> Here is my input document:
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <root>
>
>     <p>This has one sentence? Actually, it has
> <emphasis>two</emphasis>. No, it has three.</p>
>
> </root>
>
>
>
> My stylesheet:
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>
>     xmlns:xs="http://www.w3.org/2001/XMLSchema";
>
>     xmlns:rq="http://www.frameexpert.com";
>
>     exclude-result-prefixes="xs rq"
>
>     version="2.0">
>
>
>
>     <xsl:output indent="yes"/>
>
>     <xsl:strip-space elements="root"/>
>
>
>
>     <xsl:template match="/root">
>
>         <xsl:copy>
>
>             <xsl:apply-templates/>
>
>         </xsl:copy>
>
>     </xsl:template>
>
>
>
>     <xsl:template match="p">
>
>         <xsl:variable name="sentences"
> select="rq:splitParagraphIntoSentences(.)"/>
>
>         <p><xsl:value-of select="$sentences[1]"/></p>
>
>         <note>Something in between.</note>
>
>         <p><xsl:value-of select="$sentences[position()&gt;1]"/></p>
>
>     </xsl:template>
>
>
>
>     <xsl:function name="rq:splitParagraphIntoSentences">
>
>         <xsl:param name="paragraph"/>
>
>         <xsl:analyze-string select="$paragraph"
> regex=".+?[\.\?](\s+|$)">
>
>             <xsl:matching-substring>
>
>                 <sentence><xsl:value-of
> select="replace(.,'\s+$','')"/></sentence>
>
>             </xsl:matching-substring>
>
>         </xsl:analyze-string>
>
>     </xsl:function>
>
> </xsl:stylesheet>
>
>
>
> My output:
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <root>
>
>    <p>This has one sentence?</p>
>
>    <note>Something in between.</note>
>
>    <p>Actually, it has two. No, it has three.</p>
>
> </root>
>
>
>
> What I want is this:
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <root>
>
>    <p>This has one sentence? </p>
>
>    <note>Something in between.</note>
>
>    <p>Actually, it has <emphasis>two</emphasis>. No, it has three.
> </p>
>
> </root>
>
>
>
> Any suggestions will be appreciated.
>
>
>
> Rick
>
> XSL-List info and archive
> EasyUnsubscribe (by email)

Current Thread