Subject: RE: [xsl] How to parse text into words, phrases, clauses, sentences, and paragraphs From: mark bordelon <markcbordelon@xxxxxxxxx> Date: Thu, 7 Jun 2007 07:25:08 -0700 (PDT) |
Michael, That was all I needed. Thanks for your help. This list is great. Cheers, Mark, Getty Trust. --- Michael Kay <mike@xxxxxxxxxxxx> wrote: > > This is my first problem. How to apply a template > match ysing > > the tokenize() function. And which order to apply > (from > > paragraph -> word or word -> paragraph). > > It's generally easiest to do it top-down, I think. > > Something like this: > > <xsl:for-each select="tokenize(., > $sentence-delimiter)"> > <sentence id="{position()}"> > <xsl:for-each select="tokenize(., > $phrase-delimiter)"> > <phrase id="{position()}"> > <xsl:for-each select="tokenize(., > $word-delimiter)"> > <word id="{position()}"> > <xsl:value-of select="."/> > > > > > (d) doing the output numbering. > > > > I think you just need position() as shown above. > > Sometimes you need to work bottom-up if the > "sentences" can't be recognized > until you've identified the "words", for example if > you want to avoid > treating "." as ending a sentence if it appears in a > number. You're then > sometimes in the domain of positional grouping: > create a long flat list of > words, and then group it into sentences using some > kind of test applied to > the individual words. > > Michael Kay > http://www.saxonica.com/ > > ____________________________________________________________________________________ Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] How to parse text into wo, Michael Kay | Thread | [xsl] Relative URI Question, jason heddings |
Re: [xsl] Using one nodeset to dict, David Carlisle | Date | Re: [xsl] How to get element name f, Eric Larson |
Month |