RE: [xsl] How to parse text into words, phrases, clauses, sentences, and paragraphs

Subject: RE: [xsl] How to parse text into words, phrases, clauses, sentences, and paragraphs
From: mark bordelon <markcbordelon@xxxxxxxxx>
Date: Thu, 7 Jun 2007 07:25:08 -0700 (PDT)
Michael, That was all I needed. Thanks for your help.
This list is great.

Cheers,

Mark, Getty Trust.

--- Michael Kay <mike@xxxxxxxxxxxx> wrote:

> > This is my first problem. How to apply a template
> match ysing 
> > the tokenize() function. And which order to apply
> (from 
> > paragraph -> word or word -> paragraph).
> 
> It's generally easiest to do it top-down, I think.
> 
> Something like this:
> 
> <xsl:for-each select="tokenize(.,
> $sentence-delimiter)">
>   <sentence id="{position()}">
>     <xsl:for-each select="tokenize(.,
> $phrase-delimiter)">
>       <phrase id="{position()}">
>         <xsl:for-each select="tokenize(.,
> $word-delimiter)">
>           <word id="{position()}">
>             <xsl:value-of select="."/>
> > 
> > > (d) doing the output numbering.
> > 
> 
> I think you just need position() as shown above.
> 
> Sometimes you need to work bottom-up if the
> "sentences" can't be recognized
> until you've identified the "words", for example if
> you want to avoid
> treating "." as ending a sentence if it appears in a
> number. You're then
> sometimes in the domain of positional grouping:
> create a long flat list of
> words, and then group it into sentences using some
> kind of test applied to
> the individual words.
> 
> Michael Kay
> http://www.saxonica.com/
> 
> 



       
____________________________________________________________________________________
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow  

Current Thread