Subject: RE: [xsl] Tokenizing and transforming a CSV file From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Wed, 25 Feb 2009 16:53:28 -0000 |
I would use xsl:analyze-string rather than tokenize(), with a regex such as (,"[^"]*")|(,[^,]*) Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Mukul Gandhi [mailto:gandhi.mukul@xxxxxxxxx] > Sent: 25 February 2009 16:44 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] Tokenizing and transforming a CSV file > > Hi all, > I have a CSV file (named, test.csv) as following (as an > example, two lines/records are shown below): > > hi,"this is a long string, please tokenize me",hello,world > hello,please tokenize me,hi there > > I want this to be transformed to following XML: > > <result> > <record> > <field>hi</field> > <field>this is a long string, please tokenize me</field> > <field>hello</field> > <field>world</field> > </record> > <record> > <field>hello</field> > <field>please tokenize me</field> > <field>hi there</field> > </record> > </result> > > i.e, each line/record should be tokenized by a comma, with a > restriction that a comma inside a double quoted string should > not be considered as a delimiter: > > Below is my attempt upto now. > > <?xml version="1.0" encoding="UTF-8"?> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > version="2.0"> > > <xsl:output method="xml" indent="yes" /> > > <xsl:variable name="filedata" select="unparsed-text('test.csv')" /> > > <xsl:template match="/"> > <result> > <xsl:for-each select="tokenize($filedata, '\r?\n')"> > <record> > <xsl:for-each select="tokenize(., ',')"> > <field> > <xsl:value-of select="." /> > </field> > </xsl:for-each> > </record> > </xsl:for-each> > </result> > </xsl:template> > > </xsl:stylesheet> > > The above stylesheet produces following output: > > <result> > <record> > <field>hi</field> > <field>"this is a long string</field> > <field> please tokenize me"</field> > <field>hello</field> > <field>world</field> > </record> > <record> > <field>hello</field> > <field>please tokenize me</field> > <field>hi there</field> > </record> > </result> > > As per my requirement, following output fragment > > <field>"this is a long string</field> > <field> please tokenize me"</field> > > is wrong. > > This should actually appear as: > > <field>this is a long string, please tokenize me</field> > > I would appreciate any help regarding this problem. > > I am using XSLT 2.0 with Saxon 9.x. > > > -- > Regards, > Mukul Gandhi
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Tokenizing and transforming a, Mukul Gandhi | Thread | Re: [xsl] Tokenizing and transformi, Mukul Gandhi |
[xsl] Tokenizing and transforming a, Mukul Gandhi | Date | Re: [xsl] Tokenizing and transformi, Martin Honnen |
Month |