|
Subject: Re: [xsl] regex, shortest match From: Dave Pawson <davep@xxxxxxxxxxxxx> Date: Fri, 01 Aug 2008 10:19:05 +0100 |
Highly likely.I'm looking to parse sentences out of paras.
to be more exact you are trying to parse a sentence with a regular expression, which would cause you to fail a logic course as natural language must be the canonical example of a non regular language:-)
You need to define a sentence.
So perhaps a sentence is terminated by . followed by end of string or whitespace
([^.]|\.[^ \n\r\t])*\.(\s+|$)
but this would of course still fail if the sentence were to contain ". " coming from "D. P. Carlisle" or "dr. " or ...
If you try to parse natural language with a single regular expression, it will _always_ fail. But you can cover more or less arbitrarily complicated subsets of the language by making the regexp correspondingly more complicated (and slow)
<xsl:template match="para">
<para>
<xsl:analyze-string select="." regex="([^.]|\.[^ \n\r\t])*\.(\s+|$)">
<xsl:matching-substring>
<s> <xsl:value-of select="normalize-space(.)"/></s>
</xsl:matching-substring>
<xsl:non-matching-substring>
<error> <xsl:value-of select="normalize-space(.)"/> </error>
</xsl:non-matching-substring>
</xsl:analyze-string>
</para>
</xsl:template>Thanks David. That's better than my improvement. No 'error' elements in 12000 lines.
-- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] regex, shortest match, David Carlisle | Thread | Re: [xsl] regex, shortest match, David Carlisle |
| Re: [xsl] regex, shortest match, David Carlisle | Date | Re: [xsl] regex, shortest match, David Carlisle |
| Month |