Re: [xsl] text replacement with mixed content

> <x>
> <p>my foo, zzz, my <bold>foo</bold>, zzz</p>
> <p>zzz my f<b/>oo zzz m<z>y f</z>oo zzz</p>
> <p>zzz my f<b/>oo zzz m<z>y f</z>o<j>o my foo</j> zzz</p>
> <p>zzz my<b> fjjj</b></p>
> <p>zzzzz<b>my </b></p><p><b>foo zzz</b></p>
> </x>

How about regex it...

(my) *(</?\w+/?>)*(f) *(</?\w+/?>)*(o) *(</?\w+/?>)*(o)

$1 $3$5$7

One solution is to read the input as unparsed-text and create some
regex to identify the pieces and replace the identified pieces as
required. I've done this type of thing with HTML and regex before.

You can create regex to grep through all the documents extracting all
content that may be a match with some regex like my.{1,30}foo

Problems you can get into with HTML are things like <b>m>/b><b>y</b>
.. not that anyone would write that nonsense or purpose but something
similiar where individual words you are looking for is divided into
pieces.

Then use that extracted set potential matches to create one or a few
regex that works to identify the pieces you require. At that point you
have a good understanding of the problem set.

Also generally I replace escape characters for example < and  > with +
; to make things easier visually when working with unparsed-text.

Probably Andrew's solution is better although I think I would still
want to grep all potential cases and perhaps write some regex to look
for bizarre cases to better understand the data.




--
Alex Muir
Instructor | Program Organizer - University Technology Student Work
Experience Building
University of the Gambia
http://sites.utg.edu.gm/alex/

Low budget software development benefiting development in the Gambia,
West Africa
Experience of a lifetime, come to Gambia and Join UTSWEB -
http://sites.utg.edu.gm/utsweb/

Current Thread
Re: [xsl] text replacement with mixed content, (continued) Message not available Geert Bormans - 31 Aug 2011 08:58:04 -0000 Michael Kay - 31 Aug 2011 09:25:10 -0000 Message not available Andrew Welch - 31 Aug 2011 10:55:58 -0000 David Carlisle - 31 Aug 2011 10:59:43 -0000 Alex Muir - 31 Aug 2011 11:10:01 -0000 <= Andrew Welch - 31 Aug 2011 11:13:42 -0000 David Carlisle - 31 Aug 2011 11:23:25 -0000 Geert Bormans - 31 Aug 2011 11:23:44 -0000 Message not available Andrew Welch - 31 Aug 2011 11:30:28 -0000

Current Thread

Re: [xsl] text replacement with mixed content, (continued)

<- Previous	Index	Next ->
Re: [xsl] text replacement with mix, David Carlisle	Thread	Re: [xsl] text replacement with mix, Andrew Welch
Re: [xsl] text replacement with mix, David Carlisle	Date	Re: [xsl] text replacement with mix, Andrew Welch
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home