Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details in xsl From: "Pantvaidya, Vishwajit" <vpantvai@xxxxxxxxxxxxx> Date: Fri, 23 Jun 2006 10:35:24 -0700 |
Thanks Nathan - considering the problems with the CSV files, I was thinking of writing a simple Java program to convert csv to xml... >-----Original Message----- >From: Nathan Young -X (natyoung - Artizen at Cisco) >[mailto:natyoung@xxxxxxxxx] >Sent: Friday, June 23, 2006 9:53 AM >To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx >Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details >in xsl > >Hi. > >The rules you describe for handling cells with commas in them and cells >with quotes in them are widely used conventions for encoding data in >csv. Unless you are able to prevent cells from ever containing commas >or quotes you will not be able to make the csv "uniform" in a way that >does not require these (or some other) irregularities. > >There is another way of parsing csv files that works faster than regular >expressions, very generally by reading the file character by character >into a buffer and applying a set of rules at each character to decide if >you have reached the end of a cell, at which point you empty the buffer >into a cell variable (or whatever you need to do with it) and continue. >I think this is best not done in XSL though. > >If performance is indeed an issue, you are likely to be well served by >parsing out the csv file into a very simple XML format using another >language. Many existing programming languages have very robust and >performant csv parsers for them already, so you'd have that problem >mostly solved from the outset. > >------------>Nathan > > > >.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.: >||:. > >Nathan Young >Cisco.com->Interface Development >A: ncy1717 >E: natyoung@xxxxxxxxx > >> -----Original Message----- >> From: Pantvaidya, Vishwajit [mailto:vpantvai@xxxxxxxxxxxxx] >> Sent: Thursday, June 22, 2006 8:51 PM >> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx >> Subject: RE: [xsl] Converting CSV to XML without hardcoding >> schema details in xsl >> >> Thanks a lot for the xsl, Michael. >> >> My CSV has some commas in some cells - in those cases the >> entire cell value >> is itself enclosed in quotes. So a simple tokenize that >> splits at comma >> boundaries would not work - so I replaced the tokenize for >> the cells with a >> regex that took care of the quotes (is there any alternative >> here other than >> using regex?). I had to specify the quotes in the regex as " >> After this, it started taking 45 minutes to transform a 20 >> columns-35 rows >> CSV. >> >> Next problem I found was that for columns that contain commas >> in the value, >> all cells in that column are not enclosed in quotes - only >> those cells that >> actually have commas are enclosed in quotes. So I changed the regex to >> account for 0/more quotes. Now it transformed in 45 secs - surprise? >> But even now, I see that the 0/more quotes regex throws it >> off and the csv >> gets incorrectly parsed resulting in the wrong xml content. >> >> So I made some changes and the current xsl has the regex as: >> <xsl:analyze-string select="." >> regex="(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*), >(.*),(.*),&quo >> t;*(.*)"*,(.*),"*(.*)"*,(.*),(.*),"*($.*)& >quot;*,(.*)"> >> >> (now it is taking even more time - 1hour+ and still not done. >> Lets see if >> atleast the xml comes out correctly.) >> >> Any suggestions to mitigate these regex complexity due to >> non-uniformity of >> input CSV? >> >> Or am I am better off asking the CSV provider of the CSV to >> keep the CSV >> uniform so that either all cells in the column are >> with/without quotes? >> >> >> Thanks, >> >> Vish. >> >> >-----Original Message----- >> >From: Michael Kay [mailto:mike@xxxxxxxxxxxx] >> >Sent: Thursday, June 22, 2006 12:43 AM >> >To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx >> >Subject: RE: [xsl] Converting CSV to XML without hardcoding >> schema details >> >in xsl >> > >> >> Can anybody suggest how to convert CSV data in the format >> >> >> >> Field1,Field2 >> >> Value11,Value12 >> >> >> >> to xml like >> >> >> >> <Field1>Value11</Field1> >> >> <Field2>Value12</Field2> >> >> >> >> without hardcoding the fieldnames in the xsl? >> > >> ><xsl:variable name="lines" as="xs:string*" >> > select="tokenize(unparsed-text($input-file, >> '\r?\n'"))"/> >> ><xsl:variable name="field-names as="xs:string*" >> > select="tokenize($lines[1], ',')"/> >> ><xsl:for-each select="subsequence($lines,2)"> >> ><row> >> > <xsl:variable name="cells" select="tokenize(., ',')"/> >> > <xsl:for-each select="$cells"> >> > <xsl:variable name="p" as="xs:integer" select="position()"/> >> > <xsl:element name="$fields[$p]"/> >> > <xsl:value-of select="."/> >> > </ >> > </ >> ></ >> ></ >> > >> >Michael Kay >> >http://www.saxonica.com/ >> > >> > >> >> >> >> I was thinking of something like >> >> >> >> <xsl:for-each select="tokenize(., ',')"> <<xsl:value-of >> >> select="item-at($elementNames,index-of(?parent of current >> >> node?,.))"/>> <xsl:value-of select="."/> >> >> </<xsl:value-of >> >> select="item-at($elementNames,index-of(?parent of current >> >> node?,.))"/>> </xsl:for-each> >> >> >> >> where elementNames is a tokenized list of the fieldnames - >> >> but I am unable to get it to work. >> >> >> >> >> >> >> >> >-----Original Message----- >> >> >From: Pantvaidya, Vishwajit >> >> >Sent: Wednesday, June 21, 2006 12:17 AM >> >> >To: 'xsl-list@xxxxxxxxxxxxxxxxxxxxxx' >> >> >Subject: [xsl] Converting CSV to XML without hardcoding >> >> schema details >> >> >in xsl >> >> > >> >> >Hello, >> >> > >> >> >I am trying to convert a CSV datafile into XMl format. >> >> >The headers for the CSV data are in a file header.csv e.g. >> >> >Field1,Field2 The data is in a file Data.csv e.g. >> >> >Value11,Value12 >> >> >Value21,Value22 >> >> > >> >> >I need to convert the CSV data into xml output by creating >> >> xml elements >> >> >using the names in the csv header and taking the >> >> corresponding values >> >> >from the data file, so that I get an xml as follows: >> >> > >> >> ><doc> >> >> ><line> >> >> ><Field1>Value11</Field1> >> >> ><Field2>Value12</Field2> >> >> ></line> >> >> ><line> >> >> ><Field1>Value21</Field1> >> >> ><Field2>Value22</Field2> >> >> ></line> >> >> ></doc> >> >> > >> >> >I was trying to see if I can do this without hardcoding the header >> >> >names in the xsl. I reached upto the point where my xsl >> >> looks as below: >> >> > >> >> ><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >> >> >xmlns:op="http://www.w3.org/2001/12/xquery-operators" >> >> > xmlns:xf="http://www.w3.org/2001/12/xquery-functions" >> >> >version="2.0"> >> >> > >> >> > <xsl:output name="xmlFormat" method="xml" indent="yes" >> >> >omit-xml-declaration="yes"/> >> >> > >> >> > <xsl:variable name="source1" select="'data.csv'"/> >> >> > <xsl:variable name="elementNamesList" select="'Header.csv'"/> >> >> > <xsl:variable name="encoding" select="'iso-8859-1'"/> >> >> > >> >> > <xsl:variable name="elementNames" >> >> >> >select="tokenize(unparsed-text($elementNamesList,$encoding),',')"/> >> >> > <xsl:variable name="src"> >> >> > <doc> >> >> > <xsl:for-each >> >> >select="tokenize(unparsed-text($source1,$encoding), '\r?\n')"> >> >> > <line> >> >> > <xsl:for-each select="tokenize(., ',')"> >> >> > <<xsl:value-of >> >> >select="op:item-at($elementNames,index-of(?parent of current >> >> >node?,.))"/>> >> >> > <xsl:value-of select="."/> >> >> > </<xsl:value-of >> >> >select="item-at($elementNames,3)"/>> >> >> > </xsl:for-each> >> >> > </line> >> >> > </xsl:for-each> >> >> > </doc> >> >> > </xsl:variable> >> >> > >> >> > <xsl:template match="/"> >> >> > <xsl:result-document format = "xmlFormat" href = >> "src1.xml"> >> >> > <xsl:copy-of select="$src"/> >> >> > </xsl:result-document> >> >> > </xsl:template> >> >> > >> >> ></xsl:stylesheet> >> >> > >> >> >In the yet-incomplete statement <xsl:value-of >> >> >select="op:item-at($elementNames,index-of(?parent of current >> >> >node?,.))"/>, I am trying to generate an xml element with >> >> the Nth field >> >> >name from the headers name list for the Nth field value. Couple of >> >> >issues/questions here: >> >> > >> >> >- I am getting the error "Cannot find a matching >> 2-argument function >> >> >named {http://www.w3.org/2001/12/xquery-operators}item-at()" >> >> when I try >> >> >to validate the xsl. What could be the reason? >> >> > >> >> >- How can I get the ?parent of current node? Needed to compute the >> >> >index of the current data in the data record? >> >> > >> >> >- Is there any other better way to do it? Any way that I >> can do the >> >> >same using xsl:element? >> >> > >> >> >In general, is this the only/best way or is there any other >> >> better way >> >> >to achieve the same goal? >> >> > >> >> > >> >> >Thanks and Regards, >> >> > >> >> >Vish.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Converting CSV to XML wit, Nathan Young -X \(na | Thread | RE: [xsl] Converting CSV to XML wit, Pantvaidya, Vishwaji |
Re: [xsl] Counting and Rearranging , rostom aghanian | Date | [xsl] returning nodes (not a string, Steve |
Month |