Re: [xsl] Split camel-case strings into words?

Subject: Re: [xsl] Split camel-case strings into words?
From: "Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 10 May 2023 19:32:44 -0000
This is the answer ChatGPT gave me:

In XQuery, you can use regular expressions and the tokenize() function to
split a camel case string into words. Here's an example query that does this:
bashCopy code
let $input := "MicrosoftExchangeOnline" let $words := tokenize($input,
"(?=[A-Z])") return $words
In this query, the tokenize() function splits the input string $input into
words using a regular expression that matches any position in the string where
the next character is an uppercase letter. The regular expression (?=[A-Z])
uses a positive lookahead to match the position before an uppercase letter
without actually consuming the letter itself. This ensures that the split
occurs at the correct boundaries.
The resulting sequence $words contains the individual words as separate
strings. In this case, the value of $words would be ("Microsoft", "Exchange",
"Online").

But its regular expression is wrong (but close to Chris solution). I pasted
it into the BaseX query panel and it reported the regular expression as being
invalid, which it is (or rather, I trust BaseX to correctly report bad
regexs).

While I was waiting I reread the XSD specs definition of regular expressions
and could not determine from that how to do what Chris showed.

I still dont know why Chris or Martins regex works, but at least they
provide explainable solutions.

Im glad to know theres still a role for humans here

Cheers,

E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> |
Twitter<https://twitter.com/servicenow> |
YouTube<https://www.youtube.com/user/servicenowinc> |
Facebook<https://www.facebook.com/servicenow>

From: Chris Papademetrious christopher.papademetrious@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wednesday, May 10, 2023 at 2:29 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [xsl] Split camel-case strings into words?
[External Email]

________________________________
Using Martins better Unicode points with the lookbehind/lookahead approach:

tokenize('MicrosoftExchangeOnline', '(?&lt;=\p{Ll})(?=\p{Lu})', ';j')


From: Martin Honnen martin.honnen@xxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, May 10, 2023 3:17 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Split camel-case strings into words?



On 5/10/2023 9:12 PM, Eliot Kimber
eliot.kimber@xxxxxxxxxxxxxx<mailto:eliot.kimber@xxxxxxxxxxxxxx> wrote:

In an XQuery context, what is the easiest way to split a camel-case string
into words?

So given MicrosoftExchangeOnline return (Microsoft, Exchange,
Online).



I was going to ask ChatGPT but the servers are apparently overloaded with
people asking trivial questions.



Asking the XSL list instead of ChatGPT about XQuery?

Well, I think

analyze-string('MicrosoftExchangeOnline', '\p{Lu}\p{Ll}*')/*:match/string()

might do it.
XSL-List info and
archive<https://urldefense.com/v3/__http:/www.mulberrytech.com/xsl/xsl-list__
;!!A4F2R9G_pg!fjTun9sDi_gJoDjzwMOxP1pfXqbO3WiNjFLjhwudM6yRIpQnJSlTwT2896gbAMd
CAfOsWtxlWgAqJ0yLg-CKSYxX-M7yPDItGlqPuyZT4f_eh5bQuxYf$>
EasyUnsubscribe<https://urldefense.com/v3/__http:/lists.mulberrytech.com/unsu
b/xsl-list/3380743__;!!A4F2R9G_pg!fjTun9sDi_gJoDjzwMOxP1pfXqbO3WiNjFLjhwudM6y
RIpQnJSlTwT2896gbAMdCAfOsWtxlWgAqJ0yLg-CKSYxX-M7yPDItGlqPuyZT4f_eh8v0gt8E$>
(by email)
XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/3453418> (by
email<>)

Current Thread