Subject: Re: [xsl] Split camel-case strings into words? From: "Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 10 May 2023 19:51:15 -0000 |
Where would I find the definition of the lookahead and lookbehind syntax for XPath regular expressions? I did not see it in the XSD regular expression spec, so maybe I was looking in the wrong place? Cheers, E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow> From: Chris Papademetrious christopher.papademetrious@xxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wednesday, May 10, 2023 at 2:45 PM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Subject: Re: [xsl] Split camel-case strings into words? [External Email] ________________________________ Hi Eliot, A positive lookbehind is (?<=PATTERN) and a positive lookahead is (?=PATTERN). I had to escape the < as <. For the lookbehind pattern I used \p{Ll} which matches any Unicode lowercase letter, and for the lookahead pattern I similarly used \p{Lu} which matches any Unicode uppercase letter. Because lookbehinds and lookaheads do not consume any content, they match the point between the letters but not the letters themselves for determining where to tokenize. * Chris From: Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Sent: Wednesday, May 10, 2023 3:33 PM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: [xsl] Split camel-case strings into words? This is the answer ChatGPT gave me: In XQuery, you can use regular expressions and the tokenize() function to split a camel case string into words. Here's an example query that does this: bashCopy code let $input := "MicrosoftExchangeOnline" let $words := tokenize($input, "(?=[A-Z])") return $words In this query, the tokenize() function splits the input string $input into words using a regular expression that matches any position in the string where the next character is an uppercase letter. The regular expression (?=[A-Z]) uses a positive lookahead to match the position before an uppercase letter without actually consuming the letter itself. This ensures that the split occurs at the correct boundaries. The resulting sequence $words contains the individual words as separate strings. In this case, the value of $words would be ("Microsoft", "Exchange", "Online"). But its regular expression is wrong (but close to Chris solution). I pasted it into the BaseX query panel and it reported the regular expression as being invalid, which it is (or rather, I trust BaseX to correctly report bad regexs). While I was waiting I reread the XSD specs definition of regular expressions and could not determine from that how to do what Chris showed. I still dont know why Chris or Martins regex works, but at least they provide explainable solutions. Im glad to know theres still a role for humans here Cheers, E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://urldefense.com/v3/__https:/www.servicenow.com__;!!A4F2 R9G_pg!fsj8FBJEkp_-n-uuyxFJcW04AW3GaJpT2ItJY92X7st_oDm1FC517KalWeEi2yru_aK0VY Q-BSoZp9HlK941bhPiGP6iQJD9DCpp4asiyPi-KtE4GiSB$> LinkedIn<https://urldefense.com/v3/__https:/www.linkedin.com/company/servicen ow__;!!A4F2R9G_pg!fsj8FBJEkp_-n-uuyxFJcW04AW3GaJpT2ItJY92X7st_oDm1FC517KalWeE i2yru_aK0VYQ-BSoZp9HlK941bhPiGP6iQJD9DCpp4asiyPi-KlRz_tbr$> | Twitter<https://urldefense.com/v3/__https:/twitter.com/servicenow__;!!A4F2R9G _pg!fsj8FBJEkp_-n-uuyxFJcW04AW3GaJpT2ItJY92X7st_oDm1FC517KalWeEi2yru_aK0VYQ-B SoZp9HlK941bhPiGP6iQJD9DCpp4asiyPi-KlT1gUT0$> | YouTube<https://urldefense.com/v3/__https:/www.youtube.com/user/servicenowinc __;!!A4F2R9G_pg!fsj8FBJEkp_-n-uuyxFJcW04AW3GaJpT2ItJY92X7st_oDm1FC517KalWeEi2 yru_aK0VYQ-BSoZp9HlK941bhPiGP6iQJD9DCpp4asiyPi-KpBnjVsP$> | Facebook<https://urldefense.com/v3/__https:/www.facebook.com/servicenow__;!!A 4F2R9G_pg!fsj8FBJEkp_-n-uuyxFJcW04AW3GaJpT2ItJY92X7st_oDm1FC517KalWeEi2yru_aK 0VYQ-BSoZp9HlK941bhPiGP6iQJD9DCpp4asiyPi-KmBzmXnO$> XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list> EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/3453418> (by email<>)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Split camel-case strings , Chris Papademetrious | Thread | Re: [xsl] Split camel-case strings , Martin Honnen martin |
Re: [xsl] Split camel-case strings , Chris Papademetrious | Date | Re: [xsl] Split camel-case strings , Martin Honnen martin |
Month |