|
Subject: Re: [xsl] How to tokenize a comma-separated CSV record which has a field containing a string that has commas? From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 4 Aug 2022 19:45:34 -0000 |
On 04.08.2022 20:53, Roger L Costello costello@xxxxxxxxx wrote:
> Hi Folks,
>
> I'm stuck.
>
> I want to tokenize this:
>
> airport,AeroPublication/airports/airport,ARPT_IDENT,12,ARPT.TXT,ARPT,
>
> into these 7 tokens:
>
> 1. airport
> 2. AeroPublication/airports/airport
> 3. ARPT_IDENT
> 4. 12
> 5. ARPT.TXT
> 6. ARPT
> 7. '' /* empty string */
>
> And tokenize this:
>
>
cycleDate,AeroPublication/airports/airport/cycleDate,CYCLE_DATE,59,ARPT.TXT,A
RPT,"substring($ARPT_row/CYCLE__DATE, 3)"
>
> into these 7 tokens:
>
> 1. cycleDate
> 2. AeroPublication/airports/airport/cycleDate
> 3. CYCLE_DATE
> 4. 59
> 5. ARPT.TXT
> 6. APRT
> 7. substring($ARPT_row/CYCLE__DATE, 3) /* bonus points if you can also
remove the surrounding quote symbols) */
>
> Clearly this isn't the solution:
>
> tokenize(. ',')
>
> as it erroneously breaks apart the last field (string containing commas).
>
> Suggestions?
>
If you want to do it with regular expression splitting or tokenizing on
a delimiter I think most articles suggest you need a lookahead,
something not supported by pure XPath regular expressions but easily
used in Saxon and Java doing e.g.
tokenize('cycleDate,AeroPublication/airports/airport/cycleDate,CYCLE_DATE,59,
ARPT.TXT,ARPT,"substring($ARPT_row/CYCLE__DATE,
3)"', ',(?=(?:[^"]*"[^"]*")*[^"]*$)', ';j')
Expression taken from https://www.baeldung.com/java-split-string-commas
explaining "*usingpositive lookahead
<https://www.baeldung.com/java-regex-lookahead-lookbehind>, tells to
split around a comma only if there are no double quotes or if there is
an even number of double quotes ahead of it."*
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| [xsl] How to tokenize a comma-separ, Roger L Costello cos | Thread | Re: [xsl] How to tokenize a comma-s, Michael Kay mike@xxx |
| [xsl] How to tokenize a comma-separ, Roger L Costello cos | Date | Re: [xsl] How to tokenize a comma-s, Michael Kay mike@xxx |
| Month |