Subject: RE: [xsl] Converting CSV to XML without hardcoding schema details in xsl From: "Pantvaidya, Vishwajit" <vpantvai@xxxxxxxxxxxxx> Date: Tue, 27 Jun 2006 14:48:17 -0700 |
>From: Michael Kay [mailto:mike@xxxxxxxxxxxx] >Sent: Saturday, June 24, 2006 12:41 AM >> > >> >There's a lot of potential backtracking here: it might be better to >> >replace each "(.*)," with "[^,]*" or with "(.*?),". >> >> [Pantvaidya, Vishwajit] Does "[^,]*" work the same as "(.*)," >> - I understand that ^ is start of line metachar. How does the >> former match the alphabet chars? > >No, within square brackets, ^ means "not". So [^,]* matches a sequence of >any characters except comma. > >The problem with your expression is that (.*) matches as many characters as >it can. Then it sees ",", so it backtracks to find the last comma. Then it >sees the next (.*), and has to backtrack again; and so on. >> >> > >> >My own instinct would be to use something like: >> > >> >([^"]*,|"[^"]*",)* >> > >> >> [Pantvaidya, Vishwajit] Oxygen would not accept this regex as >> "it matches a zero-length string". > >Perhaps then you want to change the final "*" to a "+". > [Pantvaidya, Vishwajit] That's is the first thing I tried when the * did not work - but even then it does not seem to be working. >> Anyway, how does this regex work - it does not seem to have >> anything that matches the alphabet chars. > >See above: [^"] matches everything except quotes. > >> And does the ,|" match comma or double quotes - because >> actually some field will have both. > >The first alternative, [^"]*, matches any field that ends with a comma, and >doesn't contain a quotation mark. The second alternative, "[^"]*,", matches >any field that begins and ends with quotes (followed by a comma), and might >contain a comma between the quotes. > >It's very hard to find out what the exact rules for CSV files used by a >particular product are: for example, how it represents a field that >contains >quotation marks as well as commas. (That's one of the great advantages of >XML< you can find a specification!) If you know the exact rules for your >particular flavour of CSV, you can adapt the regex to match (well, you can >if you study a bit more about regular expressions). >> >> >> Maybe this conversion is easier done with some Java code. >> >I'm sure it can be done using regular expressions but it looks as if you >need to do some learning in this area. > [Pantvaidya, Vishwajit] Thanks a lot for all the clarifications and help. Actually I did look at the regex documentation in the XSLT2 spec, but not very exhaustively - the info on back-references I found there made me feel that could be potentially useful here e.g. to tell the regex that if a starting quote is found, look for an ending one. But the more I look into it, the more it seems like I maynot be able to use it. Thanks and regards, Vish.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Converting CSV to XML wit, Nathan Young -X \(na | Thread | RE: [xsl] Converting CSV to XML wit, Pantvaidya, Vishwaji |
[xsl] RE : [xsl] switching between , Florent Georges | Date | Re: [xsl] switching between multipl, Jirka Kosek |
Month |