[xsl] Splitting a paragraph into sentences and keep markup

Subject: [xsl] Splitting a paragraph into sentences and keep markup
From: "Rick Quatro rick@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 24 Nov 2019 13:34:26 -0000
Hi All,

 

I have a situation where I want to split a short paragraph into sentences
and use them in different parts of my output. I am using
<xsl:analyze-string> because I want to account for a sentence ending with a
. or ?. This will work except if there are any children of the paragaph,
like the <emphasis> in the second sentence. Can I split a paragraph into
sentences and still keep the markup?

 

Here is my input document:

 

<?xml version="1.0" encoding="UTF-8"?>

<root>

    <p>This has one sentence? Actually, it has <emphasis>two</emphasis>. No,
it has three.</p>

</root>

 

My stylesheet:

 

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";

    xmlns:xs="http://www.w3.org/2001/XMLSchema";

    xmlns:rq="http://www.frameexpert.com";

    exclude-result-prefixes="xs rq"

    version="2.0">

    

    <xsl:output indent="yes"/>

    <xsl:strip-space elements="root"/>

    

    <xsl:template match="/root">

        <xsl:copy>

            <xsl:apply-templates/>

        </xsl:copy>

    </xsl:template>

    

    <xsl:template match="p">

        <xsl:variable name="sentences"
select="rq:splitParagraphIntoSentences(.)"/>

        <p><xsl:value-of select="$sentences[1]"/></p>

        <note>Something in between.</note>

        <p><xsl:value-of select="$sentences[position()&gt;1]"/></p>

    </xsl:template>

    

    <xsl:function name="rq:splitParagraphIntoSentences">

        <xsl:param name="paragraph"/>

        <xsl:analyze-string select="$paragraph" regex=".+?[\.\?](\s+|$)">

            <xsl:matching-substring>

                <sentence><xsl:value-of
select="replace(.,'\s+$','')"/></sentence>

            </xsl:matching-substring>

        </xsl:analyze-string>

    </xsl:function>

</xsl:stylesheet>

 

My output:

 

<?xml version="1.0" encoding="UTF-8"?>

<root>

   <p>This has one sentence?</p>

   <note>Something in between.</note>

   <p>Actually, it has two. No, it has three.</p>

</root>

 

What I want is this:

 

<?xml version="1.0" encoding="UTF-8"?>

<root>

   <p>This has one sentence? </p>

   <note>Something in between.</note>

   <p>Actually, it has <emphasis>two</emphasis>. No, it has three. </p>

</root>

 

Any suggestions will be appreciated.

 

Rick

Current Thread