Subject: Re: [xsl] text replacement with mixed content From: Alex Muir <alex.g.muir@xxxxxxxxx> Date: Wed, 31 Aug 2011 11:09:51 +0000 |
> <x> > <p>my foo, zzz, my <bold>foo</bold>, zzz</p> > <p>zzz my f<b/>oo zzz m<z>y f</z>oo zzz</p> > <p>zzz my f<b/>oo zzz m<z>y f</z>o<j>o my foo</j> zzz</p> > <p>zzz my<b> fjjj</b></p> > <p>zzzzz<b>my </b></p><p><b>foo zzz</b></p> > </x> How about regex it... (my) *(</?\w+/?>)*(f) *(</?\w+/?>)*(o) *(</?\w+/?>)*(o) $1 $3$5$7 One solution is to read the input as unparsed-text and create some regex to identify the pieces and replace the identified pieces as required. I've done this type of thing with HTML and regex before. You can create regex to grep through all the documents extracting all content that may be a match with some regex like my.{1,30}foo Problems you can get into with HTML are things like <b>m>/b><b>y</b> .. not that anyone would write that nonsense or purpose but something similiar where individual words you are looking for is divided into pieces. Then use that extracted set potential matches to create one or a few regex that works to identify the pieces you require. At that point you have a good understanding of the problem set. Also generally I replace escape characters for example < and > with + ; to make things easier visually when working with unparsed-text. Probably Andrew's solution is better although I think I would still want to grep all potential cases and perhaps write some regex to look for bizarre cases to better understand the data. -- Alex Muir Instructor | Program Organizer - University Technology Student Work Experience Building University of the Gambia http://sites.utg.edu.gm/alex/ Low budget software development benefiting development in the Gambia, West Africa Experience of a lifetime, come to Gambia and Join UTSWEB - http://sites.utg.edu.gm/utsweb/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] text replacement with mix, David Carlisle | Thread | Re: [xsl] text replacement with mix, Andrew Welch |
Re: [xsl] text replacement with mix, David Carlisle | Date | Re: [xsl] text replacement with mix, Andrew Welch |
Month |