Re: [xsl] csv to xml converter bug

Subject: Re: [xsl] csv to xml converter bug
From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx>
Date: Wed, 11 Jul 2007 13:10:44 +0100
On 7/10/07, Andrew Welch <andrew.j.welch@xxxxxxxxx> wrote:
On 7/10/07, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> Haven't worked out the detail, but it seems to me that if you add a trailing
> comma at the end of the string, you can then do
>
> <xsl:analyze-string select="concat($in, ',')" regex='("[^"]*"|[^,]*),'>
>   <xsl:matching-substring>
>     <token><xsl:value-of select="regex-group(1)"/></token>
>   </xsl:matching-substring>
> </xsl:analyze-string>

Hmm, seems to work.

> Doesn't strip the quotes off, but that part's easy.

It is, especially as Abel wrote it for me :)

I'll try it out and then write it up, thanks!

I had to modify it to cope with nested quotes, such as "foo, ""bar""" - this is what I came up with:

<xsl:function name="fn:getTokens" as="xs:string+">
 <xsl:param name="str" as="xs:string"/>
 <xsl:analyze-string select="concat($str, ',')" regex='(("[^"]*")+|[^,]*),'>
   <xsl:matching-substring>
     <xsl:sequence select='replace(regex-group(1), "^""|""$|("")""", "$1")'/>
   </xsl:matching-substring>
 </xsl:analyze-string>
</xsl:function>

I think its a neat use of regex-group to capture both sides of the
pipe (quoted and unquoted values) but not the trailing comma.  Any
comments welcome.

I've posted the complete transform here:
http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html

cheers
andrew
--
http://andrewjwelch.com

Current Thread