Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details in xsl From: "Pantvaidya, Vishwajit" <vpantvai@xxxxxxxxxxxxx> Date: Thu, 22 Jun 2006 20:50:34 -0700 |
Thanks a lot for the xsl, Michael. My CSV has some commas in some cells - in those cases the entire cell value is itself enclosed in quotes. So a simple tokenize that splits at comma boundaries would not work - so I replaced the tokenize for the cells with a regex that took care of the quotes (is there any alternative here other than using regex?). I had to specify the quotes in the regex as " After this, it started taking 45 minutes to transform a 20 columns-35 rows CSV. Next problem I found was that for columns that contain commas in the value, all cells in that column are not enclosed in quotes - only those cells that actually have commas are enclosed in quotes. So I changed the regex to account for 0/more quotes. Now it transformed in 45 secs - surprise? But even now, I see that the 0/more quotes regex throws it off and the csv gets incorrectly parsed resulting in the wrong xml content. So I made some changes and the current xsl has the regex as: <xsl:analyze-string select="." regex="(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),&quo t;*(.*)"*,(.*),"*(.*)"*,(.*),(.*),"*($.*)"*,(.*)"> (now it is taking even more time - 1hour+ and still not done. Lets see if atleast the xml comes out correctly.) Any suggestions to mitigate these regex complexity due to non-uniformity of input CSV? Or am I am better off asking the CSV provider of the CSV to keep the CSV uniform so that either all cells in the column are with/without quotes? Thanks, Vish. >-----Original Message----- >From: Michael Kay [mailto:mike@xxxxxxxxxxxx] >Sent: Thursday, June 22, 2006 12:43 AM >To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx >Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details >in xsl > >> Can anybody suggest how to convert CSV data in the format >> >> Field1,Field2 >> Value11,Value12 >> >> to xml like >> >> <Field1>Value11</Field1> >> <Field2>Value12</Field2> >> >> without hardcoding the fieldnames in the xsl? > ><xsl:variable name="lines" as="xs:string*" > select="tokenize(unparsed-text($input-file, '\r?\n'"))"/> ><xsl:variable name="field-names as="xs:string*" > select="tokenize($lines[1], ',')"/> ><xsl:for-each select="subsequence($lines,2)"> ><row> > <xsl:variable name="cells" select="tokenize(., ',')"/> > <xsl:for-each select="$cells"> > <xsl:variable name="p" as="xs:integer" select="position()"/> > <xsl:element name="$fields[$p]"/> > <xsl:value-of select="."/> > </ > </ ></ ></ > >Michael Kay >http://www.saxonica.com/ > > >> >> I was thinking of something like >> >> <xsl:for-each select="tokenize(., ',')"> <<xsl:value-of >> select="item-at($elementNames,index-of(?parent of current >> node?,.))"/>> <xsl:value-of select="."/> >> </<xsl:value-of >> select="item-at($elementNames,index-of(?parent of current >> node?,.))"/>> </xsl:for-each> >> >> where elementNames is a tokenized list of the fieldnames - >> but I am unable to get it to work. >> >> >> >> >-----Original Message----- >> >From: Pantvaidya, Vishwajit >> >Sent: Wednesday, June 21, 2006 12:17 AM >> >To: 'xsl-list@xxxxxxxxxxxxxxxxxxxxxx' >> >Subject: [xsl] Converting CSV to XML without hardcoding >> schema details >> >in xsl >> > >> >Hello, >> > >> >I am trying to convert a CSV datafile into XMl format. >> >The headers for the CSV data are in a file header.csv e.g. >> >Field1,Field2 The data is in a file Data.csv e.g. >> >Value11,Value12 >> >Value21,Value22 >> > >> >I need to convert the CSV data into xml output by creating >> xml elements >> >using the names in the csv header and taking the >> corresponding values >> >from the data file, so that I get an xml as follows: >> > >> ><doc> >> ><line> >> ><Field1>Value11</Field1> >> ><Field2>Value12</Field2> >> ></line> >> ><line> >> ><Field1>Value21</Field1> >> ><Field2>Value22</Field2> >> ></line> >> ></doc> >> > >> >I was trying to see if I can do this without hardcoding the header >> >names in the xsl. I reached upto the point where my xsl >> looks as below: >> > >> ><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >> >xmlns:op="http://www.w3.org/2001/12/xquery-operators" >> > xmlns:xf="http://www.w3.org/2001/12/xquery-functions" >> >version="2.0"> >> > >> > <xsl:output name="xmlFormat" method="xml" indent="yes" >> >omit-xml-declaration="yes"/> >> > >> > <xsl:variable name="source1" select="'data.csv'"/> >> > <xsl:variable name="elementNamesList" select="'Header.csv'"/> >> > <xsl:variable name="encoding" select="'iso-8859-1'"/> >> > >> > <xsl:variable name="elementNames" >> >select="tokenize(unparsed-text($elementNamesList,$encoding),',')"/> >> > <xsl:variable name="src"> >> > <doc> >> > <xsl:for-each >> >select="tokenize(unparsed-text($source1,$encoding), '\r?\n')"> >> > <line> >> > <xsl:for-each select="tokenize(., ',')"> >> > <<xsl:value-of >> >select="op:item-at($elementNames,index-of(?parent of current >> >node?,.))"/>> >> > <xsl:value-of select="."/> >> > </<xsl:value-of >> >select="item-at($elementNames,3)"/>> >> > </xsl:for-each> >> > </line> >> > </xsl:for-each> >> > </doc> >> > </xsl:variable> >> > >> > <xsl:template match="/"> >> > <xsl:result-document format = "xmlFormat" href = "src1.xml"> >> > <xsl:copy-of select="$src"/> >> > </xsl:result-document> >> > </xsl:template> >> > >> ></xsl:stylesheet> >> > >> >In the yet-incomplete statement <xsl:value-of >> >select="op:item-at($elementNames,index-of(?parent of current >> >node?,.))"/>, I am trying to generate an xml element with >> the Nth field >> >name from the headers name list for the Nth field value. Couple of >> >issues/questions here: >> > >> >- I am getting the error "Cannot find a matching 2-argument function >> >named {http://www.w3.org/2001/12/xquery-operators}item-at()" >> when I try >> >to validate the xsl. What could be the reason? >> > >> >- How can I get the ?parent of current node? Needed to compute the >> >index of the current data in the data record? >> > >> >- Is there any other better way to do it? Any way that I can do the >> >same using xsl:element? >> > >> >In general, is this the only/best way or is there any other >> better way >> >to achieve the same goal? >> > >> > >> >Thanks and Regards, >> > >> >Vish.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Converting CSV to XML wit, Michael Kay | Thread | RE: [xsl] Converting CSV to XML wit, Michael Kay |
RE: [xsl] Converting CSV to XML wit, Pantvaidya, Vishwaji | Date | Re: [xsl] Merging attributes in one, Mukul Gandhi |
Month |