Subject: Re: [xsl] How to tokenize a comma-separated CSV record which has a field containing a string that has commas? From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 4 Aug 2022 19:59:32 -0000 |
A good case for Invisible XML (topic of the week at Balisage), but I'll leave someone else to flesh it out. Michael Kay Saxonica > On 4 Aug 2022, at 20:45, Martin Honnen martin.honnen@xxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > On 04.08.2022 20:53, Roger L Costello costello@xxxxxxxxx <mailto:costello@xxxxxxxxx> wrote: >> Hi Folks, >> >> I'm stuck. >> >> I want to tokenize this: >> >> airport,AeroPublication/airports/airport,ARPT_IDENT,12,ARPT.TXT,ARPT, >> >> into these 7 tokens: >> >> 1. airport >> 2. AeroPublication/airports/airport >> 3. ARPT_IDENT >> 4. 12 >> 5. ARPT.TXT >> 6. ARPT >> 7. '' /* empty string */ >> >> And tokenize this: >> >> cycleDate,AeroPublication/airports/airport/cycleDate,CYCLE_DATE,59,ARPT.TXT,A RPT,"substring($ARPT_row/CYCLE__DATE, 3)" >> >> into these 7 tokens: >> >> 1. cycleDate >> 2. AeroPublication/airports/airport/cycleDate >> 3. CYCLE_DATE >> 4. 59 >> 5. ARPT.TXT >> 6. APRT >> 7. substring($ARPT_row/CYCLE__DATE, 3) /* bonus points if you can also remove the surrounding quote symbols) */ >> >> Clearly this isn't the solution: >> >> tokenize(. ',') >> >> as it erroneously breaks apart the last field (string containing commas). >> >> Suggestions? >> > > If you want to do it with regular expression splitting or tokenizing on a delimiter I think most articles suggest you need a lookahead, something not supported by pure XPath regular expressions but easily used in Saxon and Java doing e.g. > > > > tokenize('cycleDate,AeroPublication/airports/airport/cycleDate,CYCLE_DATE,59, ARPT.TXT,ARPT,"substring($ARPT_row/CYCLE__DATE, 3)"', ',(?=(?:[^"]*"[^"]*")*[^"]*$)', ';j') > > > > Expression taken from https://www.baeldung.com/java-split-string-commas <https://www.baeldung.com/java-split-string-commas> explaining "using positive lookahead <https://www.baeldung.com/java-regex-lookahead-lookbehind>, tells to split around a comma only if there are no double quotes or if there is an even number of double quotes ahead of it." > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by email <>)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] How to tokenize a comma-s, Martin Honnen martin | Thread | Re: [xsl] How to tokenize a comma-s, Martin Honnen martin |
Re: [xsl] How to tokenize a comma-s, Martin Honnen martin | Date | Re: [xsl] How to tokenize a comma-s, Martin Honnen martin |
Month |