Re: [xsl] detect sentence surrounding a tag

Subject: Re: [xsl] detect sentence surrounding a tag
From: "Terry Badger terry_badger@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 27 Jul 2016 13:19:42 -0000
Dorothy,
This will do it and you can clean out the start and end tags of the text. <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; xmlns:xs="http://www.w3.org/2001/XMLSchema"; exclude-result-prefixes="xs" version="2.0">
    <!-- turn text nodes into elements with and without sentence endings and save in a variable-->
    <xsl:template match="root">
        <xsl:variable name="stage-1">
            <xsl:copy>
                <xsl:apply-templates/>
            </xsl:copy>
        </xsl:variable>
        <!-- see variable -->
        <xsl:result-document href="output-01.xml">
            <xsl:copy-of select="$stage-1"/>
        </xsl:result-document>
        <!-- create final output with a grouping by start text - this assumes B is embedded not at start or end -->
        <xsl:result-document href="output-02.xml">
            <root>
                <xsl:for-each-group select="$stage-1/root/node()" group-starting-with="start">
                    <sentence>
                        <xsl:copy-of select="current-group()"/>
                    </sentence>
                </xsl:for-each-group>
            </root>
        </xsl:result-document>
    </xsl:template>
    <!-- pass through B -->
    <xsl:template match="B">
        <xsl:copy-of select="."/>
    </xsl:template>
    <!-- determin what kind of text with regex -->
    <xsl:template match="text()">
<!-- assumes a space follows each end of sentence marker -->
        <xsl:analyze-string select="." regex="(.*)(\. |\? )">
            <xsl:matching-substring>
                <end>
                    <xsl:copy-of select="."/>
                </end>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <start>
                    <xsl:copy-of select="."/>
                </start>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>
</xsl:stylesheet>

Terry


On Tuesday, July 26, 2016 4:37 PM, "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:



I don't think there's a "reliable" way to recognize sentences in English text, but let's not go there... Not today. 

Generally I think there are two approaches:

(a) convert the markup (start and end of B) to text delimiters and then use regular expressions.

(b) convert the text delimiters (full stops and other punctuation) to markup (empty milestone tags?) and then use XSLT positional grouping or sibling recursion.

Neither is easy enough for me to attempt without a spare half-an-hour to devote to it.

Michael Kay
Saxonica 


On 26 Jul 2016, at 21:21, Dorothy Hoskins dorothy.hoskins@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>HI, in the case of the element A containing multiple sentences (assuming "." as end of sentence punctuation), is there a reliable way to find the sentence that surrounds the child element B wherever it occurs in A?
>
>I think that the solution (regex?) will have to look backwards from the start tag of B and past the end tag of A to the nearest "."
>
>I recognize that if there is some abbreviation or decimal number in the sentence that will be interpreted as the end of sentence. That's OK as a limitation.
>
>Thanks for your help.
>- Dorothy
>
>XSL-List info and archive 
>EasyUnsubscribe (by email) 

XSL-List info and archive 
EasyUnsubscribe (by email) 

Current Thread