Subject: Re: [xsl] I desire this function: substring-before(string, regex charset) From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 13 Apr 2025 16:16:22 -0000 |
> Here is a single XPath 4 expression (I am surprised scan-left isn't standard in XPath 3.1 ? !!!): And here is a single XPath 3.1 expression - had to define scan-left inline: let $scan-left := function($input as item()*, $init as item()*, $action as function(item()*, item()) as item()* ) as array(*)* { (0 to count($input)) ! [fold-left( subsequence($input, 1, .), $init, $action )] }, $input := 'THEN THE CURTAIN FELL', $inChars := string-to-codepoints($input) ! codepoints-to-string(.) return distinct-values( $scan-left($inChars, $input, function($s, $char) { if(contains('AEIOU', $char)) then substring-after($s, $char) else $s } ) ) Thanks, Dimitre. On Sun, Apr 13, 2025 at 9:04b/AM Dimitre Novatchev dnovatchev@xxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > Hi Roger, > > > Below is my XSLT solution. It uses the replace idea that Liam provided a > few weeks back, which is neat. > > Whereas the SNOBOL solution takes only 2 lines of code, the XSLT > solution requires many lines of code. > > Is there a simpler, shorter solution? > > Here is a single XPath 4 (I am surprised scan-left isn't standard in XPath > 3.1 ? !!!): > > let $input := 'THEN THE CURTAIN FELL', > $inChars := tokenize($input, '') > return > distinct-values(scan-left($inChars, $input, function($s, $char) > { if(contains('AEIOU', $char)) > then substring-after($s, $char) > else $s > } > ) > ) > > One could try it here: > https://fiddle.basex.org/?share=%28%27query%21%27+let0M5%22THEN+THE+CURTAIN+F ELL%22%2C3*0D5tokenize28%22%22G+return3*dist9ct-values%7Bscan-left2DJ8functio nS*7%28+if%7BKa9s%7B%22AEIOU%221%7D+then+substr9g-afterSP+else0s.P%29.7G44+6% 21%27%27%7Emode%21%27XQuery+%7BBaseX6Type%21%27xml%27%29*7+.34440+%241Jchar%7 D2%7B%243%5Cn4PP5+%3A%3D+6%7D%27%7EKext7++8M%2C+9inD9CharsG%7D3J%2C0KcontM9pu tP**S2s1.%01SPMKJGD9876543210.*_ > > > > Thanks, > Dimitre. > > On Sun, Apr 13, 2025 at 5:42b/AM Roger L Costello costello@xxxxxxxxx < > xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > >> Hi Folks, >> >> The XPath substring-before function returns "that part of the given input >> string that occurs before the first occurrence of the string given in >> $arg2." [definition from SAXON web page] >> >> substring-before($arg1 as xs:string?, $arg2 as xs:string?) --> xs:string >> >> It's a shame that the value of $arg2 can't be a regex character set, e.g., >> >> substring-before("THEN THE CURTAIN FELL", '[AEIOU]') >> >> returns TH. >> >> Even better, it would be nice if there was a third argument which >> specified that you also want the character that was matched from the >> character set: >> >> substring-before("THEN THE CURTAIN FELL", '[AEIOU]', 'plus matched >> charset character') >> >> returns THE. >> >> I believe such a function would be useful. >> >> SNOBOL has such a function. >> >> Let's see how such functionality could be used. I have this text: >> >> THEN THE CURTAIN FELL >> >> Fetch the string preceding the first vowel, plus the vowel: >> >> THE >> >> However, instead of fetching the string plus vowel, modify the text by >> nullifying the string plus vowel: >> >> N THE CURTAIN FELL >> >> Repeat on the new, shortened text. >> >> Here is the text as it is repeatedly shortened: >> >> THEN THE CURTAIN FELL >> N THE CURTAIN FELL >> CURTAIN FELL >> RTAIN FELL >> IN FELL >> N FELL >> LL >> >> General Problem Statement: There is a text string. There is a character >> set. Strip off the string prior to the first occurrence of a character from >> the character set, plus the character. Repeat until the end of text is >> reached. >> >> Below I show how to implement this in SNOBOL and then in XSLT. My XSLT >> solution is large and complex. Is there a simpler, shorter solution? >> >> First, the SNOBOL solution: >> >> Assign the variable TEXT a string: >> >> TEXT = "THEN THE CURTAIN FELL" >> >> BREAK is a built-in SNOBOL function. It has one argument, which is a >> character set. BREAK returns a pattern that matches a string up to but not >> including the character from the character set. E.g., >> >> BREAK("AEIOU") >> >> returns a pattern that matches characters up to but not including a >> vowel. This pattern: >> >> BREAK("AEIOU") LEN(1) >> >> matches characters up to a vowel, plus the vowel. >> >> Note: LEN(N) means, match any N-length character string. It is SNOBOL's >> version of the regex .{N} >> >> The following statement applies the pattern to TEXT, replacing the string >> plus vowel with null: >> >> TEXT BREAK("AEIOU") LEN(1) = >> >> To incrementally strip away the string, put the statement inside a loop: >> >> LOOP TEXT BREAK("AEIOU") LEN(1) = :F(END) >> >> OUTPUT = TEXT :(LOOP) >> >> Here is the output from running the SNOBOL program: >> >> THEN THE CURTAIN FELL >> N THE CURTAIN FELL >> CURTAIN FELL >> RTAIN FELL >> IN FELL >> N FELL >> LL >> >> Nice. >> >> Below is my XSLT solution. It uses the replace idea that Liam provided a >> few weeks back, which is neat. Whereas the SNOBOL solution takes only 2 >> lines of code, the XSLT solution requires many lines of code. Is there a >> simpler, shorter solution? >> >> Lesson Learned: when designing a new language, it might be useful for the >> language to provide something like the SNOBOL BREAK function. >> >> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >> xmlns:xs="http://www.w3.org/2001/XMLSchema" >> xmlns:f="function" >> exclude-result-prefixes="#all" >> version="3.0"> >> >> <xsl:function name="f:remove-up-to-vowel" as="xs:string*"> >> <xsl:param name="TEXT" as="xs:string"/> >> <xsl:choose> >> <!-- end of string? --> >> <xsl:when test="$TEXT eq ''"/> >> <xsl:otherwise> >> <xsl:variable name="substring-after-vowel" >> select="replace($TEXT, >> '^[^AEIOU]*[AEIOU](.*)$', '$1')" >> as="xs:string*"/> >> <xsl:sequence select="$substring-after-vowel"/> >> <xsl:choose> >> <xsl:when >> test="not(matches($substring-after-vowel,'[AEIOU]'))"/> >> <xsl:otherwise> >> <xsl:sequence >> select="f:remove-up-to-vowel($substring-after-vowel)"/> >> </xsl:otherwise> >> </xsl:choose> >> </xsl:otherwise> >> </xsl:choose> >> </xsl:function> >> >> <xsl:template match="/*"> >> <xsl:variable name="result" >> select="f:remove-up-to-vowel('THEN THE CURTAIN FELL')" >> as="xs:string*"/> >> <xsl:for-each select="$result"> >> <xsl:message> >> <xsl:value-of select="."/> >> </xsl:message> >> </xsl:for-each> >> </xsl:template> >> >> </xsl:stylesheet> >> >> >> > > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/782854> (by > email <>)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] I desire this function: s, Dimitre Novatchev dn | Thread | Re: [xsl] I desire this function: s, Joel Kalvesmaki dire |
Re: [xsl] I desire this function: s, Dimitre Novatchev dn | Date | Re: [xsl] I desire this function: s, Joel Kalvesmaki dire |
Month |