Re: [xsl] text replacement with mixed content

Subject: Re: [xsl] text replacement with mixed content
From: Geert Bormans <geert@xxxxxxxxxxxxxxxxxxx>
Date: Wed, 31 Aug 2011 13:23:16 +0200
Hi David,

This is a nice example, thanks for bringing that up.
I am still analyzing the corner cases.
I think I can remove some of the complexity already

- there will be no tags inside words, though I have found non breaking spaces and soft hyphens at unpleasant locations
something I have to take into account when I dynamically generate the regular expression
- there will be no matching across paragraphs (I can rely on some 5 or 6 elements that can have patterns to be matched, but will bear them completely)


Building on your example
I can come up with this

<x>
<p>my foo, zzz, my <bold>foo</bold>, zzz</p>
<p>zzz my f-oo zzz m-y foo zzz</p>
<p>zzz my foo zzz my <j>foo my foo</j> zzz</p>
<p>zzz my<b> fjjj</b></p>
<p>zzzzz<b>my </b><x> </x><b>foo zzz</b></p>
</x>

Without pushing people in any direction,
I was thinking about removing structure by replacing tags with markers,
make replacements (by allowing the pseudotag/markers in the regexes)
(still being careful not to end up with unbalanced pseudo tags)
and reconstruct the structure from the result with markers
I have a feeling this is similar to what Michael must be suggesting

David definitely came up with some interesting source for such an approach

Thanks already

Geert

<x>
<p>my foo, zzz, my <bold>foo</bold>, zzz</p>
<p>zzz my f<b/>oo zzz m<z>y f</z>oo zzz</p>
<p>zzz my f<b/>oo zzz m<z>y f</z>o<j>o my foo</j> zzz</p>
<p>zzz my<b> fjjj</b></p>
<p>zzzzz<b>my </b></p><p><b>foo zzz</b></p>
</x>

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. ________________________________________________________________________

Current Thread