I'm looking to parse sentences out of paras.
Input
<para>It is sometimes desired to have a specific heading which should
not be numbered. This corresponds to unnumbered list headers in lists
(see sections 4.3). To facilitate this, an optional attribute
text:is-list-header can be used. If true, the given header will not be
numbered, even if an explicit list-style is given. </para>
<para>A text:style-name attribute references a paragraph style, while a
text:cond-style-name attribute references a conditional-style, that is,
a style that contains conditions and maps to other styles (see section
14.1.1). If a conditional style is applied to a paragraph, the
text:style-name attribute contains the name of the style that was the
result of the conditional style evaluation, while the conditional style
name itself is the value of the text:cond-style-name attribute. This
XML structure simplifies [XSLT] transformations because XSLT only has to
acknowledge the conditional style if the formatting attributes are
relevant. The referenced style can be a common style or an automatic
style.</para>
<para>A text:class-names attribute takes a whitespace separated list of
paragraph style names. The referenced styles are applied in the order
they are contained in the list. If both, text:style-name and
text:class-names are present, the style referenced by the
text:style-name attribute is as the first style in the list in
text:class-names. If a conditional style is specified together with a
style:class-names attribute, but without the text:style-name attribute,
then the first style in the style list is used as the value of the
missing text:style-name attribute. </para>
<para>A page sequence element <text:page-sequence> specifies a
sequence of master pages that are instantiated in exactly the same order
as they are referenced in the page sequence. If a text document contains
a page sequence, it will consist of exactly as many pages as specified.
Documents with page sequences do not have a main text flow consisting of
headings and paragraphs as is the case for documents that do not contain
a page sequence. Text content is included within text boxes for
documents with page sequences. The only other content that is permitted
are drawing objects. </para>
This 'works', but hits the longest match. I can't come up with
a regex that has a sufficiently broad range, yet matches on the shortest
match.
Any suggestions please.
TIA DaveP
<xsl:template match="para">
<para>
<xsl:variable name='contents' select="normalize-space(.)"/>
<xsl:copy-of select="dp:sentence($contents)"/>
</para>
</xsl:template>
<!-- Isolate sentences within para's -->
<xsl:function name="dp:sentence">
<xsl:param name="nd" as='xs:string'/>
<xsl:analyze-string regex="((.+).) |$ " select="$nd">
<xsl:matching-substring>
<s>
<xsl:value-of select="regex-group(1)"/>
</s>
</xsl:matching-substring>
<xsl:non-matching-substring>
<p2><xsl:value-of select="."/></p2>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>
regards
--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk