RE: [xsl] Identifying place names in text...

Subject: RE: [xsl] Identifying place names in text...
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 21 Jul 2005 17:38:07 +0100
This isn't difficult, no need to contemplate doing it in Java. You can
tokenize the text using the tokenize() function in XSLT 2.0, or the
str:tokenize() function/template in EXSLT (www.exslt.org). Then look up each
token in your list of place names, using a key for efficiency. 

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Karl Koch [mailto:TheRanger@xxxxxxx] 
> Sent: 21 July 2005 14:56
> To: Mulberry list
> Subject: [xsl] Identifying place names in text...
> 
> Hello group,
> 
> I would like to find a way of automatically identifying 
> references to places
> in XML text. The thing is that I have a very large set of 
> content. In this
> content there are sometimes references to particular places, 
> which I want to
> know about. 
> 
> This is my xml structure (made up for simplification):
> 
> <bookshelf:
>   <book>
>     <title>1000 years of London's history</title>
>     ...
>   </book>
>   <book>
>     <title>1984</title>
>     ...
>   </book>
> </bookshelf>
> 
> Can I use XSLT to search for place names in the title of all 
> the books? I
> would like to use a wordlist of geographical place names 
> (which I already
> have). This would contain coutry and city names. The 
> stylesheet would match
> occurances of these words in the <title> XML element. The 
> output here would
> be a list of all books which have references about locations 
> in the title.
> In this example, the result would only be the first book, 
> because it has
> "London" in th title.
> 
> Perhaps this is the point where XSLT is getting too 
> complicated and I should
> consider Java as a solution. However, I am continuously 
> impressed by the
> power of XSLT and therefore I ask here because I think there 
> might be even a
> solution for that problem using XSLT.
> 
> A note on the side: The output of this stylesheet would be a 
> helper and an
> additional control for a mainly handcrafted process. I could 
> discover books
> which I have overseen in the manual process.
> 
> Any help would be greatly appreciated.
> 
> Kind Regards,
> Karl
> 
> -- 
> 5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail
> +++ GMX - die erste Adresse fo?=r Mail, Message, More +++

Current Thread