Re: [xsl] (text processing) lexical context

Subject: Re: [xsl] (text processing) lexical context
From: "cutlass" <cutlass@xxxxxxxxxxx>
Date: Wed, 24 Apr 2002 09:03:39 +0100
Hello Nicolas,

----- Original Message -----
From: "Nicolas Mazziotta" <Nicolas.Mazziotta@xxxxxxxxx>

> <root>
> This is the <w>first</w> <i>sentence</i>. This is the <w>second</w>
> <i>sentence</i>. This is the <w>third</w> <i>sentence</i>.
> </root>

this particular form of markup keeps cropping up over and over again, and i
suspect that most people will tell you that it is not so good. The main
problem with this type of markup is that it tends to be rather open
ended....eg. there could be a variety of elements, nesting structures,
etc....

> <html>
> <ol>
> <li>first: This is the <b>first</b> <i>sentence</i>.
> <li>Second: This is the <w>second</b> <i>sentence</i>.
> <li>Third: This is the <b>third</b> <i>sentence</i>.
> </ol>
> </html>
>

i am assuming u made an error with the opening <w> in second sentance ?

right so you want to

a) tokenize each sentance
b) number with words ( i.e. First, Second, Third )
c) copy all children elements within a sentance across
d) replace elements with other elements

there are a few approaches;

- you are doing too much in one transform, yes it is possible to have one
large complicated transform, but why not break up into small steps so u can
conceptualise

- u can either tokenise each sentance by customising the string tokenise
function ( many places, one of them being www.exslt.org ) and tokenise each
sentance ( based upon finding a period )

- or i suspect this is a rather good use of  Dimitre Novatchev's functional
library at www.topxml.com

both results will require a little investment in learning,

the other stuff, like copying or replacing elements, numbering with words
will come after you get over the first step.

gl, jim fuller


> But I can't figure out how I can select the text surrounding the <w>
> element without using <xsl:value-of.../>, which does not allow me to
> process the following <i> element...
>
> i.e., I get
>
> <html>
> <ol>
> <li>first: This is the <b>first</b> sentence.
> <li>Second: This is the <w>second</b> sentence.
> <li>Third: This is the <b>third</b> sentence.
> </ol>
> </html>
>
> and the <i> element is lost...
>
> And I can't do <xsl template match="substring(...)"> because substring
> is not a DOM node.
>
> Help: is there a way to process substrings or stg?
>
> N. Mazziotta
>
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread