RE: [xsl] Parsing address data from PAR and BREAK

Subject: RE: [xsl] Parsing address data from PAR and BREAK
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 26 Jan 2009 21:36:27 -0000
For this kind of problem an awful lot depends on how regular the input is -
how closely do the other examples you have to process match the example you
have shown us? For example, extracting the "zipcode" is going to be quite
difficult if the data includes addresses from a variety of different
countries with different conventional address formats.

For the data you've shown, it's something like this:

 <xsl:template match="par">
  <address><xsl:value-of select="text()[1]"/></address>
  <xsl:analyze-string select="text()[2]"
regex="^([^,]*),([^0-9])*([0-9]*)$">
    <xsl:matching-substring>
      <city><xsl:value-of select="normalize-space(regex-group(1))"/></city>
      <state><xsl:value-of
select="normalize-space(regex-group(2))"/></state>
      <zipcode><xsl:value-of
select="normalize-space(regex-group(3))"/></zipcode>
    </xsl:matching-substring>
  </xsl:analyze-string>
  <country><xsl:value-of select="text()[3]"/></country>
 </xsl:template>

But as I say, this isn't going to be very robust if your data varies much.

Michael Kay
http://www.saxonica.com/
 

> -----Original Message-----
> From: Karl Forsyth [mailto:wd@xxxxxxxxx] 
> Sent: 26 January 2009 21:23
> To: XSL List
> Subject: [xsl] Parsing address data from PAR and BREAK
> 
> Greetings,
> 
> I'm relatively new to XSLT. I need to extract legacy data 
> from an XML = representation of rich-text, and am having 
> difficulty parsing around the = <break> element. 
> Specifically, I'm trying to reliably parse address = 
> information from this:
> 
> ...
> <tablecell borderwidth=3D'0px'>
> <par def=3D'23'><run><font size=3D'9pt' name=3D'Arial' = 
> truetype=3D'false' familyid=3D'10'/>
> 123 E. Main Street<break/>Anytown, ST 12355<break/>USA</run> 
> <run><font size=3D'9pt' style=3D'bold' name=3D'Arial' 
> truetype=3D'false' = familyid=3D'10' color=3D'navy'/> </run> 
> </par> </tablecell> ...
> 
> ...to this:
> 
> <address>123 E. Main Street</address>
> <city>Anytown</city>
> <state>ST</state>
> <zipcode>12355</zipcode>
> <country>USA</country>
> 
> I'm using the Altova XSLT 2.0 engine. I've been poking around 
> trying to = find how this might be done, but am coming up 
> short. Any suggestions = will be much appreciated.
> 
> Thanks,
> 
> Karl Forsyth

Current Thread