Subject: Re: [xsl] detect sentence surrounding a tag From: "Flynn, Peter pflynn@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 27 Jul 2016 08:58:31 -0000 |
On 26/07/16 21:21, Dorothy Hoskins dorothy.hoskins@xxxxxxxxx wrote: > HI, in the case of the element A containing multiple sentences (assuming > "." as end of sentence punctuation), is there a reliable way to find the > sentence that surrounds the child element B wherever it occurs in A? > > I think that the solution (regex?) will have to look backwards from the > start tag of B and past the end tag of A to the nearest "." > > I recognize that if there is some abbreviation or decimal number in the > sentence that will be interpreted as the end of sentence. That's OK as a > limitation. Very crudely, yes (I have taken the liberty of adding a dot after the question mark and the quoted dot in your example to make them fit the pattern of "sentence ends with dot"): ========================== test.xml ================================= <A>HI, in the case of the element A containing multiple sentences (assuming "." as end of sentence punctuation), is there a reliable way to find the sentence that surrounds <B>the child element B</B> wherever it occurs in A?. I think that the solution (regex?) will have to look backwards from the start tag of <B>B and past the end tag of A</B> to the nearest ".". I recognize that if there is some abbreviation or decimal number in the sentence that will be interpreted as the end of sentence. That's OK as a limitation.</A> ========================== test.xsl ================================== <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="xml"/> <xsl:template match="/"> <text> <xsl:apply-templates/> </text> </xsl:template> <xsl:template match="A"> <xsl:for-each select="B"> <sentence> <xsl:value-of select="tokenize(preceding-sibling::text()[1],'\. ') [position()=last()]"/> <xsl:value-of select="."/> <xsl:variable name="posttext" select="following-sibling::text()[1]"/> <xsl:value-of select="tokenize(following-sibling::text()[1],'\. ')[1]"/> <xsl:text>.</xsl:text> </sentence> </xsl:for-each> </xsl:template> </xsl:stylesheet> ============================ output ================================= <?xml version="1.0" encoding="UTF-8"?><text><sentence>HI, in the case of the element A containing multiple sentences (assuming "." as end of sentence punctuation), is there a reliable way to find the sentence that surrounds the child element B wherever it occurs in A?.</sentence><sentence>I think that the solution (regex?) will have to look backwards from the start tag of B and past the end tag of A to the nearest ".".</sentence></text> ===================================================================== This will fail on a probably significant number of test cases. Making it work with sentences ending in question marks, exclamation marks, quoted dots, etc is left as an exercise...:-) ///Peter ///Peter -- Peter Flynn | Academic & Collaborative Technologies | University College Cork IT Services | b +353 21 490 2609 | b pflynn@xxxxxx | p www.ucc.ie
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] detect sentence surroundi, Terry Badger terry_b | Thread | [xsl] [ANN] First public beta relea, Michael Kay mike@xxx |
Re: [xsl] detect sentence surroundi, Michael Kay mike@xxx | Date | Re: [xsl] detect sentence surroundi, Terry Badger terry_b |
Month |