Subject: RE: [xsl] csv to xml converter bug From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Tue, 10 Jul 2007 12:21:36 +0100 |
The construct (?=X) is allowed in some regex dialects, it means "match X with a zero-width positive lookahead". But it's not allowed in the XPath regex dialect. This is basically an assertion that X must match at the current position, without causing X to be swallowed. This construct (a zero-width negative lookahead) isn't allowed either: (?!X) This is the inverse: it asserts that X does not match at the current position, without swallowing X. I'm afraid I have no idea whether these constructs can be translated into anything that the XPath regex dialect permits. Gunther Schadow can say "told you it would be needed": http://www.stylusstudio.com/xsllist/200412/post00810.html Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Andrew Welch [mailto:andrew.j.welch@xxxxxxxxx] > Sent: 10 July 2007 11:29 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] csv to xml converter bug > > The csv-to-xml solution here: > http://andrewjwelch.com/code/xslt/csv/csv-to-xml.html > > ...has a bug where > > ,,"foo,bar",,x,, > > generates the tokens: > > <token/> > <token/> > <token/> > <token>"foo,bar"</token> > <token/> > <token/> > <token>x</token> > <token/> > <token/> > > The x should be at position 5 but is at position 7 because > the commas either side of the quoted values aren't being > included with the value itself, and are generation extra > tokens in the xsl:non-matching-substring block. > > I've tried various ways to modify the solution to fix the > bug, but always ran into problems with other strings, such as: > > "foo,bar",,"foo,bar",x,,,"foo,bar" > > If you include leading or trailing commas with the quoted > values then the empty value at position 2 here gets consumed. > Maybe a better regex would help here, but I couldn't write > one... (Or perhaps if the non-matching-substring block had > access to some information about the matching-substring block...) > > I had a dig around the net and found a regex[1] that could be > sufficient to just use with tokenize, but it causes the error: > > FORX0002: Error at character 2 in regular expression > ",(?=([^\"]*\"[^\"]*\")*(?![^\"...": > expected ()) > > It works in the "The Regex Coach", but not in XSLT (with > Saxon 8.9.0.3b) > > The code is: > > <xsl:variable name="regex" > as="xs:string">,(?=([^\"]*\"[^\"]*\")*(?![^\"]*\"))</xsl:variable> > > <xsl:function name="fn:getTokens" as="xs:string+"> > <xsl:param name="str" as="xs:string"/> > <xsl:sequence select='for $t in tokenize($str, $regex) > return replace($t, "^,""|"",$|("")""", "$1")'/> > </xsl:function> > > It's an unusual looking regex (to my novice eye) - any > explanation as to whats going on would be great. > > thanks > andrew > > [1] http://weblogs.asp.net/prieck/archive/2004/01/16/59457.aspx > -- > http://andrewjwelch.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] csv to xml converter bug, Andrew Welch | Thread | Re: [xsl] csv to xml converter bug, Andrew Welch |
[xsl] csv to xml converter bug, Andrew Welch | Date | Re: [xsl] csv to xml converter bug, Andrew Welch |
Month |