RE: [xsl] segmenting a paragraph

Subject: RE: [xsl] segmenting a paragraph
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 02 Oct 2007 10:18:28 -0400
Christian,

At 04:36 AM 10/2/2007, you wrote:
When you need to apply regex matching to text that crosses node boundaries,
in the past two approaches have been proposed:

(a) create a string in which the node boundaries are represented by some
recognizable textual markup (you could use saxon:serialize()), then apply
the regex processing, then reinstate the node structure (e.g. by using
saxon:parse()).

(b) do a deep copy, while processing each of the text nodes to replace the
significant features (such as end of sentence) by nodes (e.g. an
<end-of-sentence/> element). Then apply positional grouping techniques to
transform this into your target structure.

Neither is particularly easy, I'm afraid.

This is because (yay) this requirement introduces an overlap problem. Indicators (in this case, punctuation) within text content are being taken to be structural features, which may overlap with other structures already in place.


Cheers,
Wendell


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread