Re: [xsl] text replacement with mixed content
Subject: Re: [xsl] text replacement with mixed content|
From: Geert Bormans <geert@xxxxxxxxxxxxxxxxxxx>
Date: Wed, 31 Aug 2011 15:58:39 +0200
Let me summarize some rules for markup, to show I am not throwing
away all markup (but indeed throw away some of it)
There is a pattern A that needs to be replaced with a replacement B
Pattern A is described in a string value that triggers replacement
but interfering markup can break the pure string in the actual document.
A and B can contain multiple words
A can contain markup, B is purely text
B will be wrapped in an element to indicate it is a replacement(revision)
markup that ends in A but did start before A (with the exception of
markup that started right before A) will be forced to close prior to
<p><b>here is some bolded text that my </b> foo</p>
given that "my foo" needs to become "your bar"
<p><b>here is some bolded text that </b><rev>your bar</rev></p>
<p>here is some bolded text that <b>my </b> foo</p>
<p>here is some bolded text that <rev>your bar</rev></p>
this is all about revisions, and the tricky part is to maintain or
not maintain earlier revisions
markup that starts and ends in A can be dropped
markup that starts in A and ends outside A (with the exception of
markup ending right after closing A) must be forced to reopen
there is a predictable boundary (p in this example) an A should not
cross that boundary
markup in A does not break words
soft hyphens and non breaking spaces (indicated by '-' in the
example) can break "words"
hope this helps, pretty confident that the example covers most of
this and the result is what I need
Ok I've not totally convinced by that expected output, if you are
dropping that level of markup (such as the <j>) then you are heading
towards just stripping all the markup...
If possible do some actual real world example, with solid expected
results. When the task is "non-trivial" like this one, you really
don't want mistakes in the expected results.