Subject: Re: [xsl] Extraction of data using key() and matches() From: Jakob Fix <jakob.fix@xxxxxxxxx> Date: Sun, 6 Jun 2010 00:01:54 +0200 |
On Sat, Jun 5, 2010 at 23:42, Michael Kay <mike@xxxxxxxxxxxx> wrote: > On 05/06/2010 20:02, Jakob Fix wrote: >> >> Hello, >> >> I have a large number of XML data files which contain a table with >> rows and data cells each (previously Excel files). >> >> I'm interested in finding out whether in the table's data cells there >> is or is not a given country name. If so I want to record in another >> file all country names that appear in the data file. The country name >> may be the only content of the data cell (<col>United Kingdom</col>), >> or it may be surrounded by other text (<col>Data has been provided for >> United Kingdom only.</col>). It can also be that more than one country >> name appears in a table cell. There won't be other elements in the >> cell, just character data. >> >> My current approach is to have an exhaustive lookup files with *all* >> country names that are potentially used. For each XML data file, I >> loop over all country names and query the contents of each data file >> whether it matches the current country name. >> >> > > You could create an index on all the "words" in the text using > > <xsl:key name="words" match="col" use="tokenize(., '\P{L}+')"/> > > where a word is defined as a maximal sequence of "letter" characters. > > Then to see whether a given country is present you could start by testing > whether the first word of the country name is present: > > key('words', tokenize($country, '\P{L}+')[1]) > > and then apply a more sensitive test to the result of this first filter. > > Michael Kay > Saxonica Thanks Michael, I'll give this a try. Jakob.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Extraction of data using , Michael Kay | Thread | [xsl] How to split text element to , Israel Viente |
Re: [xsl] Extraction of data using , Michael Kay | Date | Re: [xsl] JAXP reference implementa, Mukul Gandhi |
Month |