|
Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details in xsl From: "Pantvaidya, Vishwajit" <vpantvai@xxxxxxxxxxxxx> Date: Tue, 27 Jun 2006 15:11:10 -0700 |
Hi Michael,
This is what I got with the following regex:
([^"]*,|"[^"]*",)+(.*)
The ending (.*) was needed to match the last field which ends with neither a
comma nor quote.
Input CSV(3 lines):
ID,ParentID,Group,User,Title,Description,GroupBelong,EffectiveDate,Effective
Month,EffectiveDay,EffectiveYear,Months,EndDate,Name,AssumedName,Address,Typ
e,Status,Amount,AmountAggregate
1,,,,A BC - A B.
Cloud,Individual,VP,2/13/2006,February,13th,2006,36,2/12/2009,"A B C,
Inc.",D E,"38th Street, MyCity, MyState 12345",TypeA,Active,"$442,000.00
",$1.62
2,,,,ABC- Judge
ABC,Internal,VP,3/1/2006,March,1st,2006,36,2/28/2009,"Charity Services
(""CS"")",MyCity,"ABC Blvd., MyCity, MyState
12345",TypeB,Active,"$1,442,000.00 ",$1.35
Output XML:
<doc xmlns:xs="http://www.w3.org/2001/XMLSchema">
<row>
<ID>"$442,000.00 ",</ID>
<ParentID>$1.62 </ParentID>
<Group/>
<User/>
<Title/>
<Description/>
<GroupBelong/>
<EffectiveDate/>
<EffectiveMonth/>
<EffectiveDay/>
<EffectiveYear/>
<Months/>
<EndDate/>
<Name/>
<AssumedName/>
<Address/>
<Type/>
<Status/>
<Amount/>
<AmountAggregate/>
</row>
<row>
<ID>2,,,,ABC- Judge
ABC,Internal,VP,3/1/2006,March,1st,2006,36,2/28/2009,</ID>
<ParentID>"Charity Services (""CS"")",MyCity,"ABC Blvd., MyCity,
MyState 12345",TypeB,Active,"$1,442,000.00 ",$1.35 </ParentID>
<Group/>
<User/>
<Title/>
<Description/>
<GroupBelong/>
<EffectiveDate/>
<EffectiveMonth/>
<EffectiveDay/>
<EffectiveYear/>
<Months/>
<EndDate/>
<Name/>
<AssumedName/>
<Address/>
<Type/>
<Status/>
<Amount/>
<AmountAggregate/>
</row>
<row/>
</doc>
Thanks,
Vish.
>-----Original Message-----
>From: Pantvaidya, Vishwajit
>Sent: Tuesday, June 27, 2006 2:48 PM
>To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
>Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details
>in xsl
>
>>From: Michael Kay [mailto:mike@xxxxxxxxxxxx]
>>Sent: Saturday, June 24, 2006 12:41 AM
>>> >
>>> >There's a lot of potential backtracking here: it might be better to
>>> >replace each "(.*)," with "[^,]*" or with "(.*?),".
>>>
>>> [Pantvaidya, Vishwajit] Does "[^,]*" work the same as "(.*),"
>>> - I understand that ^ is start of line metachar. How does the
>>> former match the alphabet chars?
>>
>>No, within square brackets, ^ means "not". So [^,]* matches a sequence of
>>any characters except comma.
>>
>>The problem with your expression is that (.*) matches as many characters
>as
>>it can. Then it sees ",", so it backtracks to find the last comma. Then it
>>sees the next (.*), and has to backtrack again; and so on.
>>>
>>> >
>>> >My own instinct would be to use something like:
>>> >
>>> >([^"]*,|"[^"]*",)*
>>> >
>>>
>>> [Pantvaidya, Vishwajit] Oxygen would not accept this regex as
>>> "it matches a zero-length string".
>>
>>Perhaps then you want to change the final "*" to a "+".
>>
>[Pantvaidya, Vishwajit] That's is the first thing I tried when the * did
>not
>work - but even then it does not seem to be working.
>
>>> Anyway, how does this regex work - it does not seem to have
>>> anything that matches the alphabet chars.
>>
>>See above: [^"] matches everything except quotes.
>>
>>> And does the ,|" match comma or double quotes - because
>>> actually some field will have both.
>>
>>The first alternative, [^"]*, matches any field that ends with a comma,
>and
>>doesn't contain a quotation mark. The second alternative, "[^"]*,",
>matches
>>any field that begins and ends with quotes (followed by a comma), and
>might
>>contain a comma between the quotes.
>>
>>It's very hard to find out what the exact rules for CSV files used by a
>>particular product are: for example, how it represents a field that
>>contains
>>quotation marks as well as commas. (That's one of the great advantages of
>>XML< you can find a specification!) If you know the exact rules for your
>>particular flavour of CSV, you can adapt the regex to match (well, you can
>>if you study a bit more about regular expressions).
>>>
>>>
>>> Maybe this conversion is easier done with some Java code.
>>>
>>I'm sure it can be done using regular expressions but it looks as if you
>>need to do some learning in this area.
>>
>[Pantvaidya, Vishwajit] Thanks a lot for all the clarifications and help.
>Actually I did look at the regex documentation in the XSLT2 spec, but not
>very exhaustively - the info on back-references I found there made me feel
>that could be potentially useful here e.g. to tell the regex that if a
>starting quote is found, look for an ending one. But the more I look into
>it, the more it seems like I maynot be able to use it.
>
>Thanks and regards,
>
>Vish.
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| RE: [xsl] Converting CSV to XML wit, Pantvaidya, Vishwaji | Thread | [xsl] Counting and Rearranging Node, rostom aghanian |
| RE: [xsl] Converting CSV to XML wit, Pantvaidya, Vishwaji | Date | [xsl] Filtering problem, Marcus Streets |
| Month |