Re: [xsl] text replacement with mixed content

Subject: Re: [xsl] text replacement with mixed content
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Wed, 31 Aug 2011 10:24:52 +0100
This isn't a trivial task, so you may or may not get someone to give
you a working solution for free.....

One way to tackle this is to:

- tokenize the search string into individual words

- mark up those individual works in the document

- identify sequences of that markup

- replace the sequences with the replacement markup


Yes, it's definitely challenging. Reading the problem and Andrew's solution makes me realise that this is an example of the class of problems which Michael Jackson (of Jackson Structured Programming fame) calls "boundary clash" problems. In the markup field these tend to be described as "overlap" problems. You have two hierarchies in the document - the element hierarchy and the sentence/word/character hierarchy, and they overlap in the sense that the boundaries in one hierarchy don't coincide with those in the other. The technique, at a very high level of abstraction, is to rearrange the document into the hierarchy that you want to process, while retaining sufficient information to reconstitute the other hierarchy when you are done. This retained information can either be inline (perhaps in the form of "milestone" tags), or out-of-line (an index of pointers into the text).

Michael Kay
Saxonica

Current Thread