Re: [xsl] Split camel-case strings into words?

Subject: Re: [xsl] Split camel-case strings into words?
From: "Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 10 May 2023 19:51:15 -0000
Where would I find the definition of the lookahead and lookbehind syntax for
XPath regular expressions? I did not see it in the XSD regular expression
spec, so maybe I was looking in the wrong place?

Cheers,

E.

_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> |
Twitter<https://twitter.com/servicenow> |
YouTube<https://www.youtube.com/user/servicenowinc> |
Facebook<https://www.facebook.com/servicenow>

From: Chris Papademetrious christopher.papademetrious@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wednesday, May 10, 2023 at 2:45 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [xsl] Split camel-case strings into words?
[External Email]

________________________________
Hi Eliot,

A positive lookbehind is (?<=PATTERN) and a positive lookahead is (?=PATTERN).
I had to escape the < as &lt;. For the lookbehind pattern I used \p{Ll}
which matches any Unicode lowercase letter, and for the lookahead pattern I
similarly used \p{Lu} which matches any Unicode uppercase letter. Because
lookbehinds and lookaheads do not consume any content, they match the point
between the letters  but not the letters themselves  for determining where
to tokenize.


  *   Chris

From: Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, May 10, 2023 3:33 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Split camel-case strings into words?


This is the answer ChatGPT gave me:

In XQuery, you can use regular expressions and the tokenize() function to
split a camel case string into words. Here's an example query that does this:
bashCopy code
let $input := "MicrosoftExchangeOnline" let $words := tokenize($input,
"(?=[A-Z])") return $words
In this query, the tokenize() function splits the input string $input into
words using a regular expression that matches any position in the string where
the next character is an uppercase letter. The regular expression (?=[A-Z])
uses a positive lookahead to match the position before an uppercase letter
without actually consuming the letter itself. This ensures that the split
occurs at the correct boundaries.
The resulting sequence $words contains the individual words as separate
strings. In this case, the value of $words would be ("Microsoft", "Exchange",
"Online").

But its regular expression is wrong (but close to Chris solution). I pasted
it into the BaseX query panel and it reported the regular expression as being
invalid, which it is (or rather, I trust BaseX to correctly report bad
regexs).

While I was waiting I reread the XSD specs definition of regular expressions
and could not determine from that how to do what Chris showed.

I still dont know why Chris or Martins regex works, but at least they
provide explainable solutions.

Im glad to know theres still a role for humans here

Cheers,

E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://urldefense.com/v3/__https:/www.servicenow.com__;!!A4F2
R9G_pg!fsj8FBJEkp_-n-uuyxFJcW04AW3GaJpT2ItJY92X7st_oDm1FC517KalWeEi2yru_aK0VY
Q-BSoZp9HlK941bhPiGP6iQJD9DCpp4asiyPi-KtE4GiSB$>
LinkedIn<https://urldefense.com/v3/__https:/www.linkedin.com/company/servicen
ow__;!!A4F2R9G_pg!fsj8FBJEkp_-n-uuyxFJcW04AW3GaJpT2ItJY92X7st_oDm1FC517KalWeE
i2yru_aK0VYQ-BSoZp9HlK941bhPiGP6iQJD9DCpp4asiyPi-KlRz_tbr$> |
Twitter<https://urldefense.com/v3/__https:/twitter.com/servicenow__;!!A4F2R9G
_pg!fsj8FBJEkp_-n-uuyxFJcW04AW3GaJpT2ItJY92X7st_oDm1FC517KalWeEi2yru_aK0VYQ-B
SoZp9HlK941bhPiGP6iQJD9DCpp4asiyPi-KlT1gUT0$> |
YouTube<https://urldefense.com/v3/__https:/www.youtube.com/user/servicenowinc
__;!!A4F2R9G_pg!fsj8FBJEkp_-n-uuyxFJcW04AW3GaJpT2ItJY92X7st_oDm1FC517KalWeEi2
yru_aK0VYQ-BSoZp9HlK941bhPiGP6iQJD9DCpp4asiyPi-KpBnjVsP$> |
Facebook<https://urldefense.com/v3/__https:/www.facebook.com/servicenow__;!!A
4F2R9G_pg!fsj8FBJEkp_-n-uuyxFJcW04AW3GaJpT2ItJY92X7st_oDm1FC517KalWeEi2yru_aK
0VYQ-BSoZp9HlK941bhPiGP6iQJD9DCpp4asiyPi-KmBzmXnO$>
XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/3453418> (by
email<>)

Current Thread