|
Subject: RE: [xsl] How to parse text into words, phrases, clauses, sentences, and paragraphs From: mark bordelon <markcbordelon@xxxxxxxxx> Date: Thu, 7 Jun 2007 07:25:08 -0700 (PDT) |
Michael, That was all I needed. Thanks for your help.
This list is great.
Cheers,
Mark, Getty Trust.
--- Michael Kay <mike@xxxxxxxxxxxx> wrote:
> > This is my first problem. How to apply a template
> match ysing
> > the tokenize() function. And which order to apply
> (from
> > paragraph -> word or word -> paragraph).
>
> It's generally easiest to do it top-down, I think.
>
> Something like this:
>
> <xsl:for-each select="tokenize(.,
> $sentence-delimiter)">
> <sentence id="{position()}">
> <xsl:for-each select="tokenize(.,
> $phrase-delimiter)">
> <phrase id="{position()}">
> <xsl:for-each select="tokenize(.,
> $word-delimiter)">
> <word id="{position()}">
> <xsl:value-of select="."/>
> >
> > > (d) doing the output numbering.
> >
>
> I think you just need position() as shown above.
>
> Sometimes you need to work bottom-up if the
> "sentences" can't be recognized
> until you've identified the "words", for example if
> you want to avoid
> treating "." as ending a sentence if it appears in a
> number. You're then
> sometimes in the domain of positional grouping:
> create a long flat list of
> words, and then group it into sentences using some
> kind of test applied to
> the individual words.
>
> Michael Kay
> http://www.saxonica.com/
>
>
____________________________________________________________________________________
Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| RE: [xsl] How to parse text into wo, Michael Kay | Thread | [xsl] Relative URI Question, jason heddings |
| Re: [xsl] Using one nodeset to dict, David Carlisle | Date | Re: [xsl] How to get element name f, Eric Larson |
| Month |