[xsl] Re: Using XSLT to add markup to a document

Subject: [xsl] Re: Using XSLT to add markup to a document
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Fri, 4 Jul 2003 06:43:20 +0200
This is a variation of the "replace all occurences of a string" problem,
which is well documented in Dave Pawson's FAQ.

The new element is that instead of searching for a single string, one must
search for the (next) occurence of any one from a list of strings.

This necessitates stepping through the string one character at a time and
checking if the remaining text that starts at the current character position
does not start with one of the given strings, and if yes, doing the replace
operation.

Some time ago I was asked to help a colleague with exactly this problem and
the solution produced did not use FXSL.

Using FXSL here is convenient, because one can use its generic string
processing templates (e.g. str-foldl or str-map) that process a string one
character at a time. This will be very similar to the processing done to
split a camelCase name into separate words:

http://www.dpawson.co.uk/xsl/sect2/plaintext.html#d6160e507


An added complexity is what to do when one of the search strings is the
start of another -- this must be specified by the person, who requests the
solution. One way is to use the search strings in priority of their order.
Another is to search for the longest string first.

There are also some optimization techniques to minimize the depth of the
recursion and to achieve faster processing:

"Two-stage recursive algorithms in XSLT"
http://www.topxml.com/xsl/articles/recurse/

As I'm rather busy right at this moment, I have to search and find the
solution (this was a month ago) and I'll post it in the next 24 hours.



=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL



"Jim Melton" <jim.melton@xxxxxxx> wrote in message
news:4.3.2.7.2.20030703141900.0474ad38@xxxxxxxxxxxxxxxxxxxxxxxxxx
> Gentlepeople,
>
> I'm struggling with a problem that I fear isn't easily solved with XSLT,
> but there are many experts on this list who might be able to help.  The
> brief summary of my problem is that I want to find certain words that
> appear in paragraphs throughout a very large (XML) document and mark up
> those words without making any other changes to my document.
>
> For example, consider a document with the following fragment:
>
> <para>
> This is a sample document that deals with markup of <emph>text</emph>.
> </para>
> <para>
> When one applies <emph>markup</emph> to a large document, one is faced
with
> a <def>time-consuming</def> effort.
> </para>
>
> If one of the words to which I wish to apply markup is "markup" and
another
> is "document", then I would want the result to be something like this:
>
> <para>
> This is a sample <special>document</special> that deals with
> <special>markup</special> of <emph>text</emph>.
> </para>
> <para>
> When one applies <emph><special>markup</special></emph> to a large
> <special>document</special>, one is faced with a <def>time-consuming</def>
> effort.
> </para>
>
> As you see from this example, I want to *add* markup to the words I have
> found where they appear in my result tree, but copy everything else in my
> document to the output tree unchanged.
>
> I tend to use Saxon (currently using 6.5.2) as my primary XSLT engine, but
> I also have Microsoft's MSXML 4.0 (and could undoubtedly find others if
> required to do so).
>
> Any guidance or advice?
>
> Many thanks,
>     Jim
> ========================================================================
> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
> Oracle Corporation            Oracle Email: mailto:jim.melton@xxxxxxxxxx
> 1930 Viscounti Drive          Standards email: mailto:jim.melton@xxxxxxx
> Sandy, UT 84093-1063              Personal email: mailto:jim@xxxxxxxxxxx
> USA                                                Fax : +1.801.942.3345
> ========================================================================
> =  Facts are facts.  However, any opinions expressed are the opinions  =
> =  only of myself and may or may not reflect the opinions of anybody   =
> =  else with whom I may or may not have discussed the issues at hand.  =
> ========================================================================
>
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread