|
Subject: [xsl] I desire this function: substring-before(string, regex charset) From: "Roger L Costello costello@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 13 Apr 2025 12:42:29 -0000 |
Hi Folks,
The XPath substring-before function returns "that part of the given input
string that occurs before the first occurrence of the string given in $arg2."
[definition from SAXON web page]
substring-before($arg1 as xs:string?, $arg2 as xs:string?) --> xs:string
It's a shame that the value of $arg2 can't be a regex character set, e.g.,
substring-before("THEN THE CURTAIN FELL", '[AEIOU]')
returns TH.
Even better, it would be nice if there was a third argument which specified
that you also want the character that was matched from the character set:
substring-before("THEN THE CURTAIN FELL", '[AEIOU]', 'plus matched charset
character')
returns THE.
I believe such a function would be useful.
SNOBOL has such a function.
Let's see how such functionality could be used. I have this text:
THEN THE CURTAIN FELL
Fetch the string preceding the first vowel, plus the vowel:
THE
However, instead of fetching the string plus vowel, modify the text by
nullifying the string plus vowel:
N THE CURTAIN FELL
Repeat on the new, shortened text.
Here is the text as it is repeatedly shortened:
THEN THE CURTAIN FELL
N THE CURTAIN FELL
CURTAIN FELL
RTAIN FELL
IN FELL
N FELL
LL
General Problem Statement: There is a text string. There is a character set.
Strip off the string prior to the first occurrence of a character from the
character set, plus the character. Repeat until the end of text is reached.
Below I show how to implement this in SNOBOL and then in XSLT. My XSLT
solution is large and complex. Is there a simpler, shorter solution?
First, the SNOBOL solution:
Assign the variable TEXT a string:
TEXT = "THEN THE CURTAIN FELL"
BREAK is a built-in SNOBOL function. It has one argument, which is a character
set. BREAK returns a pattern that matches a string up to but not including the
character from the character set. E.g.,
BREAK("AEIOU")
returns a pattern that matches characters up to but not including a vowel.
This pattern:
BREAK("AEIOU") LEN(1)
matches characters up to a vowel, plus the vowel.
Note: LEN(N) means, match any N-length character string. It is SNOBOL's
version of the regex .{N}
The following statement applies the pattern to TEXT, replacing the string plus
vowel with null:
TEXT BREAK("AEIOU") LEN(1) =
To incrementally strip away the string, put the statement inside a loop:
LOOP TEXT BREAK("AEIOU") LEN(1) = :F(END)
OUTPUT = TEXT :(LOOP)
Here is the output from running the SNOBOL program:
THEN THE CURTAIN FELL
N THE CURTAIN FELL
CURTAIN FELL
RTAIN FELL
IN FELL
N FELL
LL
Nice.
Below is my XSLT solution. It uses the replace idea that Liam provided a few
weeks back, which is neat. Whereas the SNOBOL solution takes only 2 lines of
code, the XSLT solution requires many lines of code. Is there a simpler,
shorter solution?
Lesson Learned: when designing a new language, it might be useful for the
language to provide something like the SNOBOL BREAK function.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="function"
exclude-result-prefixes="#all"
version="3.0">
<xsl:function name="f:remove-up-to-vowel" as="xs:string*">
<xsl:param name="TEXT" as="xs:string"/>
<xsl:choose>
<!-- end of string? -->
<xsl:when test="$TEXT eq ''"/>
<xsl:otherwise>
<xsl:variable name="substring-after-vowel"
select="replace($TEXT, '^[^AEIOU]*[AEIOU](.*)$', '$1')"
as="xs:string*"/>
<xsl:sequence select="$substring-after-vowel"/>
<xsl:choose>
<xsl:when
test="not(matches($substring-after-vowel,'[AEIOU]'))"/>
<xsl:otherwise>
<xsl:sequence
select="f:remove-up-to-vowel($substring-after-vowel)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:template match="/*">
<xsl:variable name="result"
select="f:remove-up-to-vowel('THEN THE CURTAIN FELL')"
as="xs:string*"/>
<xsl:for-each select="$result">
<xsl:message>
<xsl:value-of select="."/>
</xsl:message>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Flatten DITA conrefs, Paul Tyson phtyson@x | Thread | Re: [xsl] I desire this function: s, G. Ken Holman g.ken. |
| Re: [xsl] Flatten DITA conrefs, rick@xxxxxxxxxxxxxx | Date | Re: [xsl] I desire this function: s, G. Ken Holman g.ken. |
| Month |