|
Subject: Re: [xsl] I desire this function: substring-before(string, regex charset) From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 13 Apr 2025 16:04:04 -0000 |
Hi Roger,
> Below is my XSLT solution. It uses the replace idea that Liam provided a
few weeks back, which is neat.
> Whereas the SNOBOL solution takes only 2 lines of code, the XSLT solution
requires many lines of code.
> Is there a simpler, shorter solution?
Here is a single XPath 4 (I am surprised scan-left isn't standard in XPath
3.1 ? !!!):
let $input := 'THEN THE CURTAIN FELL',
$inChars := tokenize($input, '')
return
distinct-values(scan-left($inChars, $input, function($s, $char)
{ if(contains('AEIOU', $char))
then substring-after($s, $char)
else $s
}
)
)
One could try it here:
https://fiddle.basex.org/?share=%28%27query%21%27+let0M5%22THEN+THE+CURTAIN+F
ELL%22%2C3*0D5tokenize28%22%22G+return3*dist9ct-values%7Bscan-left2DJ8functio
nS*7%28+if%7BKa9s%7B%22AEIOU%221%7D+then+substr9g-afterSP+else0s.P%29.7G44+6%
21%27%27%7Emode%21%27XQuery+%7BBaseX6Type%21%27xml%27%29*7+.34440+%241Jchar%7
D2%7B%243%5Cn4PP5+%3A%3D+6%7D%27%7EKext7++8M%2C+9inD9CharsG%7D3J%2C0KcontM9pu
tP**S2s1.%01SPMKJGD9876543210.*_
Thanks,
Dimitre.
On Sun, Apr 13, 2025 at 5:42b/AM Roger L Costello costello@xxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi Folks,
>
> The XPath substring-before function returns "that part of the given input
> string that occurs before the first occurrence of the string given in
> $arg2." [definition from SAXON web page]
>
> substring-before($arg1 as xs:string?, $arg2 as xs:string?) --> xs:string
>
> It's a shame that the value of $arg2 can't be a regex character set, e.g.,
>
> substring-before("THEN THE CURTAIN FELL", '[AEIOU]')
>
> returns TH.
>
> Even better, it would be nice if there was a third argument which
> specified that you also want the character that was matched from the
> character set:
>
> substring-before("THEN THE CURTAIN FELL", '[AEIOU]', 'plus matched charset
> character')
>
> returns THE.
>
> I believe such a function would be useful.
>
> SNOBOL has such a function.
>
> Let's see how such functionality could be used. I have this text:
>
> THEN THE CURTAIN FELL
>
> Fetch the string preceding the first vowel, plus the vowel:
>
> THE
>
> However, instead of fetching the string plus vowel, modify the text by
> nullifying the string plus vowel:
>
> N THE CURTAIN FELL
>
> Repeat on the new, shortened text.
>
> Here is the text as it is repeatedly shortened:
>
> THEN THE CURTAIN FELL
> N THE CURTAIN FELL
> CURTAIN FELL
> RTAIN FELL
> IN FELL
> N FELL
> LL
>
> General Problem Statement: There is a text string. There is a character
> set. Strip off the string prior to the first occurrence of a character from
> the character set, plus the character. Repeat until the end of text is
> reached.
>
> Below I show how to implement this in SNOBOL and then in XSLT. My XSLT
> solution is large and complex. Is there a simpler, shorter solution?
>
> First, the SNOBOL solution:
>
> Assign the variable TEXT a string:
>
> TEXT = "THEN THE CURTAIN FELL"
>
> BREAK is a built-in SNOBOL function. It has one argument, which is a
> character set. BREAK returns a pattern that matches a string up to but not
> including the character from the character set. E.g.,
>
> BREAK("AEIOU")
>
> returns a pattern that matches characters up to but not including a vowel.
> This pattern:
>
> BREAK("AEIOU") LEN(1)
>
> matches characters up to a vowel, plus the vowel.
>
> Note: LEN(N) means, match any N-length character string. It is SNOBOL's
> version of the regex .{N}
>
> The following statement applies the pattern to TEXT, replacing the string
> plus vowel with null:
>
> TEXT BREAK("AEIOU") LEN(1) =
>
> To incrementally strip away the string, put the statement inside a loop:
>
> LOOP TEXT BREAK("AEIOU") LEN(1) = :F(END)
>
> OUTPUT = TEXT :(LOOP)
>
> Here is the output from running the SNOBOL program:
>
> THEN THE CURTAIN FELL
> N THE CURTAIN FELL
> CURTAIN FELL
> RTAIN FELL
> IN FELL
> N FELL
> LL
>
> Nice.
>
> Below is my XSLT solution. It uses the replace idea that Liam provided a
> few weeks back, which is neat. Whereas the SNOBOL solution takes only 2
> lines of code, the XSLT solution requires many lines of code. Is there a
> simpler, shorter solution?
>
> Lesson Learned: when designing a new language, it might be useful for the
> language to provide something like the SNOBOL BREAK function.
>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> xmlns:xs="http://www.w3.org/2001/XMLSchema"
> xmlns:f="function"
> exclude-result-prefixes="#all"
> version="3.0">
>
> <xsl:function name="f:remove-up-to-vowel" as="xs:string*">
> <xsl:param name="TEXT" as="xs:string"/>
> <xsl:choose>
> <!-- end of string? -->
> <xsl:when test="$TEXT eq ''"/>
> <xsl:otherwise>
> <xsl:variable name="substring-after-vowel"
> select="replace($TEXT,
> '^[^AEIOU]*[AEIOU](.*)$', '$1')"
> as="xs:string*"/>
> <xsl:sequence select="$substring-after-vowel"/>
> <xsl:choose>
> <xsl:when
> test="not(matches($substring-after-vowel,'[AEIOU]'))"/>
> <xsl:otherwise>
> <xsl:sequence
> select="f:remove-up-to-vowel($substring-after-vowel)"/>
> </xsl:otherwise>
> </xsl:choose>
> </xsl:otherwise>
> </xsl:choose>
> </xsl:function>
>
> <xsl:template match="/*">
> <xsl:variable name="result"
> select="f:remove-up-to-vowel('THEN THE CURTAIN FELL')"
> as="xs:string*"/>
> <xsl:for-each select="$result">
> <xsl:message>
> <xsl:value-of select="."/>
> </xsl:message>
> </xsl:for-each>
> </xsl:template>
>
> </xsl:stylesheet>
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] I desire this function: s, Martin Honnen martin | Thread | Re: [xsl] I desire this function: s, Dimitre Novatchev dn |
| Re: [xsl] I desire this function: s, Martin Honnen martin | Date | Re: [xsl] I desire this function: s, Dimitre Novatchev dn |
| Month |