RE: [xsl] Regular expression for matching sentences

Subject: RE: [xsl] Regular expression for matching sentences
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 18 Aug 2006 11:08:15 +0100
The reason that \b isn't in the XPath regular expression dialect is that its
meaning is very sensitive to the conventions of the natural language that
you're using. This restriction is in the W3C spec, not in Saxon.

A lot depends how clever you want to be. I would think you'd get a 95%
success rate by using

tokenize($in, '[\.\?!]\s+')

but in my experience it's tricky knowing what to do about characters such as
")", '"', or em dash that might appear immediately after a full stop.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Carlo Liwanag [mailto:cliwanag@xxxxxxxxxxxx] 
> Sent: 18 August 2006 10:56
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Regular expression for matching sentences
> 
> I am trying to match my text template to catch sentences 
> (sentences will end in '.','?','!') So that I can count the 
> number of em-spaces on it. But I just don't know how to 
> create it without using \b (because saxon probably does not 
> support it). Is there an alternative? Please help.
> Thanks,
> Carlo

Current Thread