Re: [xsl] Extraction of data using key() and matches()

Subject: Re: [xsl] Extraction of data using key() and matches()
From: Jakob Fix <jakob.fix@xxxxxxxxx>
Date: Sun, 6 Jun 2010 00:01:54 +0200
On Sat, Jun 5, 2010 at 23:42, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> On 05/06/2010 20:02, Jakob Fix wrote:
>>
>> Hello,
>>
>> I have a large number of XML data files which contain a table with
>> rows and data cells each (previously Excel files).
>>
>> I'm interested in finding out whether in the table's data cells there
>> is or is not a given country name. If so I want to record in another
>> file all country names that appear in the data file. The country name
>> may be the only content of the data cell (<col>United Kingdom</col>),
>> or it may be surrounded by other text (<col>Data has been provided for
>> United Kingdom only.</col>). It can also be that more than one country
>> name appears in a table cell. There won't be other elements in the
>> cell, just character data.
>>
>> My current approach is to have an exhaustive lookup files with *all*
>> country names that are potentially used. For each XML data file, I
>> loop over all country names and query the contents of each data file
>> whether it matches the current country name.
>>
>>
>
> You could create an index on all the "words" in the text using
>
> <xsl:key name="words" match="col" use="tokenize(., '\P{L}+')"/>
>
> where a word is defined as a maximal sequence of "letter" characters.
>
> Then to see whether a given country is present you could start by testing
> whether the first word of the country name is present:
>
> key('words', tokenize($country, '\P{L}+')[1])
>
> and then apply a more sensitive test to the result of this first filter.
>
> Michael Kay
> Saxonica


Thanks Michael, I'll give this a try.

Jakob.

Current Thread