Re: [xsl] XSLT1.0 and wildcards

Subject: Re: [xsl] XSLT1.0 and wildcards
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Tue, 03 Oct 2006 10:21:27 +0200
Pankaj Bishnoi wrote:
For example The address line is like this: melkweg 51a.

This means I have to map this like this:  street = melkweg, number = 51,
extension = a
Are there such wildcards, and/or is there a better way to do this?

Hi Pankaj,


Yes, there's a better way. As a matter of fact, this happens to be a specialized field of science. Depending on your needs, there are numerous ways to resolve this. Since you appear to live in Holland and the address exists in Amsterdam, consider the following address lines:

1) Melkweg 51 A
2) Plein 40-45 123-IV
3) 1ste J vd Heijdenstraat 12-hs

ad 1) this is a common extension
ad 2) street is "Plein 40-45", nr is "123", suffix (floor) is "IV"
ad 3) suffix 'hs' means 'huis' means ground floor in Holland. Note the number in the streetname.


Perhaps you'd thought of all this already. International addresses pose even more challenges: the French and the English place their streetnumbers as ordinals as start of the address line. I hope you won't have to deal with non-western characters or hebrew digits.... Hopefully nobody entered the postal code or city name on the same line ;-)

(that's why postal companies offer products to normalize the addresses to some well-known format. But beware, they offer about 95% matches, the rest will still dropout)

Now, for a solution with XSLT 1, it will be quite a challenge. I think you will have to pass the address line multiple times through the translate-filter that was proposed by Michael.

When you can resort to XSLT 2 or a filter before processing (like with client-side, you may be able to use javascript + regular expression to filter, on server side, you may use java/.net/perl/php + regular expression to filter your data), the regular expression may look like this (needs tweaking):

^(.*) ([0-9]+)([ -]?([a-zA-Z]))?$

$1 contains streetname
$2 contains number
$4 contains suffix (use $4 if you want it to include space or hyphen)

The regex will work for the above three examples (spaces are important in the regex).

Cheers,
-- Abel Braaksma

Current Thread