Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details in xsl From: "Pantvaidya, Vishwajit" <vpantvai@xxxxxxxxxxxxx> Date: Tue, 27 Jun 2006 15:11:10 -0700 |
Hi Michael, This is what I got with the following regex: ([^"]*,|"[^"]*",)+(.*) The ending (.*) was needed to match the last field which ends with neither a comma nor quote. Input CSV(3 lines): ID,ParentID,Group,User,Title,Description,GroupBelong,EffectiveDate,Effective Month,EffectiveDay,EffectiveYear,Months,EndDate,Name,AssumedName,Address,Typ e,Status,Amount,AmountAggregate 1,,,,A BC - A B. Cloud,Individual,VP,2/13/2006,February,13th,2006,36,2/12/2009,"A B C, Inc.",D E,"38th Street, MyCity, MyState 12345",TypeA,Active,"$442,000.00 ",$1.62 2,,,,ABC- Judge ABC,Internal,VP,3/1/2006,March,1st,2006,36,2/28/2009,"Charity Services (""CS"")",MyCity,"ABC Blvd., MyCity, MyState 12345",TypeB,Active,"$1,442,000.00 ",$1.35 Output XML: <doc xmlns:xs="http://www.w3.org/2001/XMLSchema"> <row> <ID>"$442,000.00 ",</ID> <ParentID>$1.62 </ParentID> <Group/> <User/> <Title/> <Description/> <GroupBelong/> <EffectiveDate/> <EffectiveMonth/> <EffectiveDay/> <EffectiveYear/> <Months/> <EndDate/> <Name/> <AssumedName/> <Address/> <Type/> <Status/> <Amount/> <AmountAggregate/> </row> <row> <ID>2,,,,ABC- Judge ABC,Internal,VP,3/1/2006,March,1st,2006,36,2/28/2009,</ID> <ParentID>"Charity Services (""CS"")",MyCity,"ABC Blvd., MyCity, MyState 12345",TypeB,Active,"$1,442,000.00 ",$1.35 </ParentID> <Group/> <User/> <Title/> <Description/> <GroupBelong/> <EffectiveDate/> <EffectiveMonth/> <EffectiveDay/> <EffectiveYear/> <Months/> <EndDate/> <Name/> <AssumedName/> <Address/> <Type/> <Status/> <Amount/> <AmountAggregate/> </row> <row/> </doc> Thanks, Vish. >-----Original Message----- >From: Pantvaidya, Vishwajit >Sent: Tuesday, June 27, 2006 2:48 PM >To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx >Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details >in xsl > >>From: Michael Kay [mailto:mike@xxxxxxxxxxxx] >>Sent: Saturday, June 24, 2006 12:41 AM >>> > >>> >There's a lot of potential backtracking here: it might be better to >>> >replace each "(.*)," with "[^,]*" or with "(.*?),". >>> >>> [Pantvaidya, Vishwajit] Does "[^,]*" work the same as "(.*)," >>> - I understand that ^ is start of line metachar. How does the >>> former match the alphabet chars? >> >>No, within square brackets, ^ means "not". So [^,]* matches a sequence of >>any characters except comma. >> >>The problem with your expression is that (.*) matches as many characters >as >>it can. Then it sees ",", so it backtracks to find the last comma. Then it >>sees the next (.*), and has to backtrack again; and so on. >>> >>> > >>> >My own instinct would be to use something like: >>> > >>> >([^"]*,|"[^"]*",)* >>> > >>> >>> [Pantvaidya, Vishwajit] Oxygen would not accept this regex as >>> "it matches a zero-length string". >> >>Perhaps then you want to change the final "*" to a "+". >> >[Pantvaidya, Vishwajit] That's is the first thing I tried when the * did >not >work - but even then it does not seem to be working. > >>> Anyway, how does this regex work - it does not seem to have >>> anything that matches the alphabet chars. >> >>See above: [^"] matches everything except quotes. >> >>> And does the ,|" match comma or double quotes - because >>> actually some field will have both. >> >>The first alternative, [^"]*, matches any field that ends with a comma, >and >>doesn't contain a quotation mark. The second alternative, "[^"]*,", >matches >>any field that begins and ends with quotes (followed by a comma), and >might >>contain a comma between the quotes. >> >>It's very hard to find out what the exact rules for CSV files used by a >>particular product are: for example, how it represents a field that >>contains >>quotation marks as well as commas. (That's one of the great advantages of >>XML< you can find a specification!) If you know the exact rules for your >>particular flavour of CSV, you can adapt the regex to match (well, you can >>if you study a bit more about regular expressions). >>> >>> >>> Maybe this conversion is easier done with some Java code. >>> >>I'm sure it can be done using regular expressions but it looks as if you >>need to do some learning in this area. >> >[Pantvaidya, Vishwajit] Thanks a lot for all the clarifications and help. >Actually I did look at the regex documentation in the XSLT2 spec, but not >very exhaustively - the info on back-references I found there made me feel >that could be potentially useful here e.g. to tell the regex that if a >starting quote is found, look for an ending one. But the more I look into >it, the more it seems like I maynot be able to use it. > >Thanks and regards, > >Vish.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Converting CSV to XML wit, Pantvaidya, Vishwaji | Thread | [xsl] Counting and Rearranging Node, rostom aghanian |
RE: [xsl] Converting CSV to XML wit, Pantvaidya, Vishwaji | Date | [xsl] Filtering problem, Marcus Streets |
Month |