Subject: Re: [xsl] regex, shortest match From: Dave Pawson <davep@xxxxxxxxxxxxx> Date: Fri, 01 Aug 2008 10:19:05 +0100 |
Highly likely.I'm looking to parse sentences out of paras.
to be more exact you are trying to parse a sentence with a regular expression, which would cause you to fail a logic course as natural language must be the canonical example of a non regular language:-)
You need to define a sentence.
So perhaps a sentence is terminated by . followed by end of string or whitespace
([^.]|\.[^ \n\r\t])*\.(\s+|$)
but this would of course still fail if the sentence were to contain ". " coming from "D. P. Carlisle" or "dr. " or ...
If you try to parse natural language with a single regular expression, it will _always_ fail. But you can cover more or less arbitrarily complicated subsets of the language by making the regexp correspondingly more complicated (and slow)
<xsl:template match="para"> <para> <xsl:analyze-string select="." regex="([^.]|\.[^ \n\r\t])*\.(\s+|$)"> <xsl:matching-substring> <s> <xsl:value-of select="normalize-space(.)"/></s> </xsl:matching-substring> <xsl:non-matching-substring> <error> <xsl:value-of select="normalize-space(.)"/> </error> </xsl:non-matching-substring> </xsl:analyze-string> </para> </xsl:template>
Thanks David. That's better than my improvement. No 'error' elements in 12000 lines.
-- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] regex, shortest match, David Carlisle | Thread | Re: [xsl] regex, shortest match, David Carlisle |
Re: [xsl] regex, shortest match, David Carlisle | Date | Re: [xsl] regex, shortest match, David Carlisle |
Month |