Subject: Re: [xsl] How to tokenize a comma-separated CSV record which has a field containing a string that has commas? From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 4 Aug 2022 19:45:34 -0000 |
On 04.08.2022 20:53, Roger L Costello costello@xxxxxxxxx wrote: > Hi Folks, > > I'm stuck. > > I want to tokenize this: > > airport,AeroPublication/airports/airport,ARPT_IDENT,12,ARPT.TXT,ARPT, > > into these 7 tokens: > > 1. airport > 2. AeroPublication/airports/airport > 3. ARPT_IDENT > 4. 12 > 5. ARPT.TXT > 6. ARPT > 7. '' /* empty string */ > > And tokenize this: > > cycleDate,AeroPublication/airports/airport/cycleDate,CYCLE_DATE,59,ARPT.TXT,A RPT,"substring($ARPT_row/CYCLE__DATE, 3)" > > into these 7 tokens: > > 1. cycleDate > 2. AeroPublication/airports/airport/cycleDate > 3. CYCLE_DATE > 4. 59 > 5. ARPT.TXT > 6. APRT > 7. substring($ARPT_row/CYCLE__DATE, 3) /* bonus points if you can also remove the surrounding quote symbols) */ > > Clearly this isn't the solution: > > tokenize(. ',') > > as it erroneously breaks apart the last field (string containing commas). > > Suggestions? > If you want to do it with regular expression splitting or tokenizing on a delimiter I think most articles suggest you need a lookahead, something not supported by pure XPath regular expressions but easily used in Saxon and Java doing e.g. tokenize('cycleDate,AeroPublication/airports/airport/cycleDate,CYCLE_DATE,59, ARPT.TXT,ARPT,"substring($ARPT_row/CYCLE__DATE, 3)"', ',(?=(?:[^"]*"[^"]*")*[^"]*$)', ';j') Expression taken from https://www.baeldung.com/java-split-string-commas explaining "*usingpositive lookahead <https://www.baeldung.com/java-regex-lookahead-lookbehind>, tells to split around a comma only if there are no double quotes or if there is an even number of double quotes ahead of it."*
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] How to tokenize a comma-separ, Roger L Costello cos | Thread | Re: [xsl] How to tokenize a comma-s, Michael Kay mike@xxx |
[xsl] How to tokenize a comma-separ, Roger L Costello cos | Date | Re: [xsl] How to tokenize a comma-s, Michael Kay mike@xxx |
Month |