Subject: RE : [xsl] (text processing) lexical context From: "Nicolas Mazziotta" <Nicolas.Mazziotta@xxxxxxxxx> Date: Wed, 24 Apr 2002 11:17:21 +0200 |
Thank you all! I'll try all those solutions and see which one fits the best. I cannot tokenize sentences because the docs i work on are far more complex and it would give such results: <p><sentence></p><p></sentence><sentence></sentence></p> Or i woul have to write a complex script. > -----Message d'origine----- > De : owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx [mailto:owner-xsl- > list@xxxxxxxxxxxxxxxxxxxxxx] De la part de Michael Kay > Envoyé : mercredi 24 avril 2002 10:32 > À : xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Objet : RE: [xsl] (text processing) lexical context > > One other piece of advice (somewhat heretical for this list): XSLT is not > the only tool in your kitbag. In fact, where you want to identify > structure > in the source that's not explicit in the markup, XSLT is often not the > best > tool for the job. > > You could probably tackle this one more easily by writing a SAX filter > that > inserts a <sentence> start tag immediately after <root>, a </sentence> end > tag immediately before </root>, and a </sentence><sentence> pair > immediately > after a "." that's followed by whitespace. > > Michael Kay > Software AG > home: Michael.H.Kay@xxxxxxxxxxxx > work: Michael.Kay@xxxxxxxxxxxxxx > > > -----Original Message----- > > From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx > > [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx]On Behalf Of cutlass > > Sent: 24 April 2002 09:04 > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > > Subject: Re: [xsl] (text processing) lexical context > > > > > > Hello Nicolas, > > > > ----- Original Message ----- > > From: "Nicolas Mazziotta" <Nicolas.Mazziotta@xxxxxxxxx> > > > > > <root> > > > This is the <w>first</w> <i>sentence</i>. This is the <w>second</w> > > > <i>sentence</i>. This is the <w>third</w> <i>sentence</i>. > > > </root> > > > > this particular form of markup keeps cropping up over and > > over again, and i > > suspect that most people will tell you that it is not so > > good. The main > > problem with this type of markup is that it tends to be rather open > > ended....eg. there could be a variety of elements, nesting structures, > > etc.... > > > > > <html> > > > <ol> > > > <li>first: This is the <b>first</b> <i>sentence</i>. > > > <li>Second: This is the <w>second</b> <i>sentence</i>. > > > <li>Third: This is the <b>third</b> <i>sentence</i>. > > > </ol> > > > </html> > > > > > > > i am assuming u made an error with the opening <w> in second > > sentance ? > > > > right so you want to > > > > a) tokenize each sentance > > b) number with words ( i.e. First, Second, Third ) > > c) copy all children elements within a sentance across > > d) replace elements with other elements > > > > there are a few approaches; > > > > - you are doing too much in one transform, yes it is possible > > to have one > > large complicated transform, but why not break up into small > > steps so u can > > conceptualise > > > > - u can either tokenise each sentance by customising the > > string tokenise > > function ( many places, one of them being www.exslt.org ) and > > tokenise each > > sentance ( based upon finding a period ) > > > > - or i suspect this is a rather good use of Dimitre > > Novatchev's functional > > library at www.topxml.com > > > > both results will require a little investment in learning, > > > > the other stuff, like copying or replacing elements, > > numbering with words > > will come after you get over the first step. > > > > gl, jim fuller > > > > > > > But I can't figure out how I can select the text surrounding the <w> > > > element without using <xsl:value-of.../>, which does not allow me to > > > process the following <i> element... > > > > > > i.e., I get > > > > > > <html> > > > <ol> > > > <li>first: This is the <b>first</b> sentence. > > > <li>Second: This is the <w>second</b> sentence. > > > <li>Third: This is the <b>third</b> sentence. > > > </ol> > > > </html> > > > > > > and the <i> element is lost... > > > > > > And I can't do <xsl template match="substring(...)"> > > because substring > > > is not a DOM node. > > > > > > Help: is there a way to process substrings or stg? > > > > > > N. Mazziotta > > > > > > > > > XSL-List info and archive: > http://www.mulberrytech.com/xsl/xsl-list > > > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] (text processing) lexical, Michael Kay | Thread | [xsl] oracle table to html via xsl, Denis McCarthy |
[xsl] oracle table to html via xsl, Denis McCarthy | Date | Re: [xsl] oracle table to html via , Matt Gushee |
Month |