Re: [xsl] I desire this function: substring-before(string, regex charset)

Subject: Re: [xsl] I desire this function: substring-before(string, regex charset)
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 13 Apr 2025 16:16:22 -0000
>  Here is a single XPath 4 expression (I am surprised scan-left isn't
standard in XPath 3.1 ? !!!):

And here is a single XPath 3.1 expression - had to define scan-left inline:

let $scan-left := function($input as item()*, $init as item()*,
                           $action as function(item()*, item()) as item()*
                           ) as array(*)*
                  { (0 to count($input))
                           ! [fold-left( subsequence($input, 1, .), $init,
$action )]
                  },
    $input := 'THEN THE CURTAIN FELL',
    $inChars := string-to-codepoints($input) ! codepoints-to-string(.)
 return
   distinct-values(
             $scan-left($inChars, $input, function($s, $char)
                                            { if(contains('AEIOU', $char))
then substring-after($s, $char)
                                              else $s
                                            }
                        )
                      )

Thanks,
Dimitre.

On Sun, Apr 13, 2025 at 9:04b/AM Dimitre Novatchev dnovatchev@xxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Hi Roger,
>
> > Below is my XSLT solution. It uses the replace idea that Liam provided a
> few weeks back, which is neat.
> > Whereas the SNOBOL solution takes only 2 lines of code, the XSLT
> solution requires many lines of code.
> > Is there a simpler, shorter solution?
>
> Here is a single XPath 4 (I am surprised scan-left isn't standard in XPath
> 3.1 ? !!!):
>
>  let $input := 'THEN THE CURTAIN FELL',
>     $inChars := tokenize($input, '')
>  return
>    distinct-values(scan-left($inChars, $input, function($s, $char)
>                                          { if(contains('AEIOU', $char))
> then substring-after($s, $char)
>                                            else $s
>                                           }
>                                       )
>                          )
>
> One could try it here:
>
https://fiddle.basex.org/?share=%28%27query%21%27+let0M5%22THEN+THE+CURTAIN+F
ELL%22%2C3*0D5tokenize28%22%22G+return3*dist9ct-values%7Bscan-left2DJ8functio
nS*7%28+if%7BKa9s%7B%22AEIOU%221%7D+then+substr9g-afterSP+else0s.P%29.7G44+6%
21%27%27%7Emode%21%27XQuery+%7BBaseX6Type%21%27xml%27%29*7+.34440+%241Jchar%7
D2%7B%243%5Cn4PP5+%3A%3D+6%7D%27%7EKext7++8M%2C+9inD9CharsG%7D3J%2C0KcontM9pu
tP**S2s1.%01SPMKJGD9876543210.*_
>
>
>
> Thanks,
> Dimitre.
>
> On Sun, Apr 13, 2025 at 5:42b/AM Roger L Costello costello@xxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>> Hi Folks,
>>
>> The XPath substring-before function returns "that part of the given input
>> string that occurs before the first occurrence of the string given in
>> $arg2." [definition from SAXON web page]
>>
>> substring-before($arg1 as xs:string?, $arg2 as xs:string?) --> xs:string
>>
>> It's a shame that the value of $arg2 can't be a regex character set, e.g.,
>>
>> substring-before("THEN THE CURTAIN FELL", '[AEIOU]')
>>
>> returns TH.
>>
>> Even better, it would be nice if there was a third argument which
>> specified that you also want the character that was matched from the
>> character set:
>>
>> substring-before("THEN THE CURTAIN FELL", '[AEIOU]', 'plus matched
>> charset character')
>>
>> returns THE.
>>
>> I believe such a function would be useful.
>>
>> SNOBOL has such a function.
>>
>> Let's see how such functionality could be used. I have this text:
>>
>> THEN THE CURTAIN FELL
>>
>> Fetch the string preceding the first vowel, plus the vowel:
>>
>> THE
>>
>> However, instead of fetching the string plus vowel, modify the text by
>> nullifying the string plus vowel:
>>
>> N THE CURTAIN FELL
>>
>> Repeat on the new, shortened text.
>>
>> Here is the text as it is repeatedly shortened:
>>
>> THEN THE CURTAIN FELL
>> N THE CURTAIN FELL
>>  CURTAIN FELL
>> RTAIN FELL
>> IN FELL
>> N FELL
>> LL
>>
>> General Problem Statement: There is a text string. There is a character
>> set. Strip off the string prior to the first occurrence of a character
from
>> the character set, plus the character. Repeat until the end of text is
>> reached.
>>
>> Below I show how to implement this in SNOBOL and then in XSLT. My XSLT
>> solution is large and complex. Is there a simpler, shorter solution?
>>
>> First, the SNOBOL solution:
>>
>> Assign the variable TEXT a string:
>>
>> TEXT = "THEN THE CURTAIN FELL"
>>
>> BREAK is a built-in SNOBOL function. It has one argument, which is a
>> character set. BREAK returns a pattern that matches a string up to but not
>> including the character from the character set. E.g.,
>>
>> BREAK("AEIOU")
>>
>> returns a pattern that matches characters up to but not including a
>> vowel. This pattern:
>>
>> BREAK("AEIOU") LEN(1)
>>
>> matches characters up to a vowel, plus the vowel.
>>
>> Note: LEN(N) means, match any N-length character string. It is SNOBOL's
>> version of the regex .{N}
>>
>> The following statement applies the pattern to TEXT, replacing the string
>> plus vowel with null:
>>
>> TEXT BREAK("AEIOU") LEN(1) =
>>
>> To incrementally strip away the string, put the statement inside a loop:
>>
>> LOOP    TEXT BREAK("AEIOU") LEN(1) =                             :F(END)
>>
>>                 OUTPUT =  TEXT                                   :(LOOP)
>>
>> Here is the output from running the SNOBOL program:
>>
>> THEN THE CURTAIN FELL
>> N THE CURTAIN FELL
>>  CURTAIN FELL
>> RTAIN FELL
>> IN FELL
>> N FELL
>> LL
>>
>> Nice.
>>
>> Below is my XSLT solution. It uses the replace idea that Liam provided a
>> few weeks back, which is neat. Whereas the SNOBOL solution takes only 2
>> lines of code, the XSLT solution requires many lines of code. Is there a
>> simpler, shorter solution?
>>
>> Lesson Learned: when designing a new language, it might be useful for the
>> language to provide something like the SNOBOL BREAK function.
>>
>> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>>                 xmlns:xs="http://www.w3.org/2001/XMLSchema";
>>                 xmlns:f="function"
>>                 exclude-result-prefixes="#all"
>>                 version="3.0">
>>
>>     <xsl:function name="f:remove-up-to-vowel" as="xs:string*">
>>         <xsl:param name="TEXT" as="xs:string"/>
>>         <xsl:choose>
>>             <!-- end of string? -->
>>             <xsl:when test="$TEXT eq ''"/>
>>             <xsl:otherwise>
>>                 <xsl:variable name="substring-after-vowel"
>>                            select="replace($TEXT,
>> '^[^AEIOU]*[AEIOU](.*)$', '$1')"
>>                            as="xs:string*"/>
>>                 <xsl:sequence select="$substring-after-vowel"/>
>>                 <xsl:choose>
>>                     <xsl:when
>> test="not(matches($substring-after-vowel,'[AEIOU]'))"/>
>>                     <xsl:otherwise>
>>                         <xsl:sequence
>> select="f:remove-up-to-vowel($substring-after-vowel)"/>
>>                     </xsl:otherwise>
>>                 </xsl:choose>
>>             </xsl:otherwise>
>>         </xsl:choose>
>>     </xsl:function>
>>
>>     <xsl:template match="/*">
>>         <xsl:variable name="result"
>>                    select="f:remove-up-to-vowel('THEN THE CURTAIN FELL')"
>>                   as="xs:string*"/>
>>         <xsl:for-each select="$result">
>>             <xsl:message>
>>                 <xsl:value-of select="."/>
>>             </xsl:message>
>>         </xsl:for-each>
>>     </xsl:template>
>>
>> </xsl:stylesheet>
>>
>>
>>
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/782854> (by
> email <>)

Current Thread