Re: [xsl] Extraction of data using key() and matches()

On Sat, Jun 5, 2010 at 23:42, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> On 05/06/2010 20:02, Jakob Fix wrote:
>>
>> Hello,
>>
>> I have a large number of XML data files which contain a table with
>> rows and data cells each (previously Excel files).
>>
>> I'm interested in finding out whether in the table's data cells there
>> is or is not a given country name. If so I want to record in another
>> file all country names that appear in the data file. The country name
>> may be the only content of the data cell (<col>United Kingdom</col>),
>> or it may be surrounded by other text (<col>Data has been provided for
>> United Kingdom only.</col>). It can also be that more than one country
>> name appears in a table cell. There won't be other elements in the
>> cell, just character data.
>>
>> My current approach is to have an exhaustive lookup files with *all*
>> country names that are potentially used. For each XML data file, I
>> loop over all country names and query the contents of each data file
>> whether it matches the current country name.
>>
>>
>
> You could create an index on all the "words" in the text using
>
> <xsl:key name="words" match="col" use="tokenize(., '\P{L}+')"/>
>
> where a word is defined as a maximal sequence of "letter" characters.
>
> Then to see whether a given country is present you could start by testing
> whether the first word of the country name is present:
>
> key('words', tokenize($country, '\P{L}+')[1])
>
> and then apply a more sensitive test to the result of this first filter.
>
> Michael Kay
> Saxonica


Thanks Michael, I'll give this a try.

Jakob.

<- Previous	Index	Next ->
Re: [xsl] Extraction of data using , Michael Kay	Thread	[xsl] How to split text element to , Israel Viente
Re: [xsl] Extraction of data using , Michael Kay	Date	Re: [xsl] JAXP reference implementa, Mukul Gandhi
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home