Re: [xsl] text replacement with mixed content

Subject: Re: [xsl] text replacement with mixed content
From: Geert Bormans <geert@xxxxxxxxxxxxxxxxxxx>
Date: Wed, 31 Aug 2011 15:58:39 +0200
Let me summarize some rules for markup, to show I am not throwing away all markup (but indeed throw away some of it)

There is a pattern A that needs to be replaced with a replacement B
Pattern A is described in a string value that triggers replacement but interfering markup can break the pure string in the actual document.
A and B can contain multiple words
A can contain markup, B is purely text
B will be wrapped in an element to indicate it is a replacement(revision)
markup that ends in A but did start before A (with the exception of markup that started right before A) will be forced to close prior to A's replacement
<p><b>here is some bolded text that my </b> foo</p>
will become
given that "my foo" needs to become "your bar"
<p><b>here is some bolded text that </b><rev>your bar</rev></p>

<p>here is some bolded text that <b>my </b> foo</p>
will become
<p>here is some bolded text that <rev>your bar</rev></p>

this is all about revisions, and the tricky part is to maintain or not maintain earlier revisions

markup that starts and ends in A can be dropped
markup that starts in A and ends outside A (with the exception of markup ending right after closing A) must be forced to reopen
there is a predictable boundary (p in this example) an A should not cross that boundary
markup in A does not break words
soft hyphens and non breaking spaces (indicated by '-' in the example) can break "words"

hope this helps, pretty confident that the example covers most of this and the result is what I need

Ok I've not totally convinced by that expected output, if you are
dropping that level of markup (such as the <j>) then you are heading
towards just stripping all the markup...

If possible do some actual real world example, with solid expected
results.  When the task is "non-trivial" like this one, you really
don't want mistakes in the expected results.

-- Andrew Welch

Current Thread