[no subject]

I will also look into what you have suggested.

Thanks,

Vish.

>-----Original Message-----
>From: Nathan Young -X (natyoung - Artizen at Cisco)
>[mailto:natyoung@xxxxxxxxx]
>Sent: Monday, June 26, 2006 11:02 AM
>To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
>Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details
>in xsl
>
>Hi.
>
>I don't know how you need to treat performance but regular expressions
>are going to be a lot slower than the low level css parsing routines you
>can get by using a perl, java or c library someone wrote to parse csv.
>These are cleverly written and perform very well, a quick web search for
>your language will turn up useful links if you go this route.
>
>If "good enough" is good enough for you performance-wise, regular
>expressions probably can work for you.  If you do pursue this I strongly
>recommend an application called "regex coach" for troubleshooting and
>learning regular expressions.  It really makes the effects of your
>expression visible to you and lets you quickly adjust and try
>variations.
>
>----->Nathan
>
>>
>> > >
>> > >There's a lot of potential backtracking here: it might be
>> better to
>> > >replace each "(.*)," with "[^,]*" or with "(.*?),".
>> >
>> > [Pantvaidya, Vishwajit] Does "[^,]*" work the same as "(.*),"
>> > - I understand that ^ is start of line metachar. How does the
>> > former match the alphabet chars?
>>
>> No, within square brackets, ^ means "not". So [^,]* matches a
>> sequence of
>> any characters except comma.
>>
>> The problem with your expression is that (.*) matches as many
>> characters as
>> it can. Then it sees ",", so it backtracks to find the last
>> comma. Then it
>> sees the next (.*), and has to backtrack again; and so on.
>> >
>> > >
>> > >My own instinct would be to use something like:
>> > >
>> > >([^"]*,|"[^"]*",)*
>> > >
>> >
>> > [Pantvaidya, Vishwajit] Oxygen would not accept this regex as
>> > "it matches a zero-length string".
>>
>> Perhaps then you want to change the final "*" to a "+".
>>
>> > Anyway, how does this regex work - it does not seem to have
>> > anything that matches the alphabet chars.
>>
>> See above: [^"] matches everything except quotes.
>>
>> > And does the ,|" match comma or double quotes - because
>> > actually some field will have both.
>>
>> The first alternative, [^"]*, matches any field that ends
>> with a comma, and
>> doesn't contain a quotation mark. The second alternative,
>> "[^"]*,", matches
>> any field that begins and ends with quotes (followed by a
>> comma), and might
>> contain a comma between the quotes.
>>
>> It's very hard to find out what the exact rules for CSV files
>> used by a
>> particular product are: for example, how it represents a
>> field that contains
>> quotation marks as well as commas. (That's one of the great
>> advantages of
>> XML< you can find a specification!) If you know the exact
>> rules for your
>> particular flavour of CSV, you can adapt the regex to match
>> (well, you can
>> if you study a bit more about regular expressions).
>> >
>> >
>> > Maybe this conversion is easier done with some Java code.
>> >
>> I'm sure it can be done using regular expressions but it
>> looks as if you
>> need to do some learning in this area.
>>
>> Michael Kay
>> http://www.saxonica.com/

Current Thread