Subject: [xsl] I desire this function: substring-before(string, regex charset) From: "Roger L Costello costello@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 13 Apr 2025 12:42:29 -0000 |
Hi Folks, The XPath substring-before function returns "that part of the given input string that occurs before the first occurrence of the string given in $arg2." [definition from SAXON web page] substring-before($arg1 as xs:string?, $arg2 as xs:string?) --> xs:string It's a shame that the value of $arg2 can't be a regex character set, e.g., substring-before("THEN THE CURTAIN FELL", '[AEIOU]') returns TH. Even better, it would be nice if there was a third argument which specified that you also want the character that was matched from the character set: substring-before("THEN THE CURTAIN FELL", '[AEIOU]', 'plus matched charset character') returns THE. I believe such a function would be useful. SNOBOL has such a function. Let's see how such functionality could be used. I have this text: THEN THE CURTAIN FELL Fetch the string preceding the first vowel, plus the vowel: THE However, instead of fetching the string plus vowel, modify the text by nullifying the string plus vowel: N THE CURTAIN FELL Repeat on the new, shortened text. Here is the text as it is repeatedly shortened: THEN THE CURTAIN FELL N THE CURTAIN FELL CURTAIN FELL RTAIN FELL IN FELL N FELL LL General Problem Statement: There is a text string. There is a character set. Strip off the string prior to the first occurrence of a character from the character set, plus the character. Repeat until the end of text is reached. Below I show how to implement this in SNOBOL and then in XSLT. My XSLT solution is large and complex. Is there a simpler, shorter solution? First, the SNOBOL solution: Assign the variable TEXT a string: TEXT = "THEN THE CURTAIN FELL" BREAK is a built-in SNOBOL function. It has one argument, which is a character set. BREAK returns a pattern that matches a string up to but not including the character from the character set. E.g., BREAK("AEIOU") returns a pattern that matches characters up to but not including a vowel. This pattern: BREAK("AEIOU") LEN(1) matches characters up to a vowel, plus the vowel. Note: LEN(N) means, match any N-length character string. It is SNOBOL's version of the regex .{N} The following statement applies the pattern to TEXT, replacing the string plus vowel with null: TEXT BREAK("AEIOU") LEN(1) = To incrementally strip away the string, put the statement inside a loop: LOOP TEXT BREAK("AEIOU") LEN(1) = :F(END) OUTPUT = TEXT :(LOOP) Here is the output from running the SNOBOL program: THEN THE CURTAIN FELL N THE CURTAIN FELL CURTAIN FELL RTAIN FELL IN FELL N FELL LL Nice. Below is my XSLT solution. It uses the replace idea that Liam provided a few weeks back, which is neat. Whereas the SNOBOL solution takes only 2 lines of code, the XSLT solution requires many lines of code. Is there a simpler, shorter solution? Lesson Learned: when designing a new language, it might be useful for the language to provide something like the SNOBOL BREAK function. <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:f="function" exclude-result-prefixes="#all" version="3.0"> <xsl:function name="f:remove-up-to-vowel" as="xs:string*"> <xsl:param name="TEXT" as="xs:string"/> <xsl:choose> <!-- end of string? --> <xsl:when test="$TEXT eq ''"/> <xsl:otherwise> <xsl:variable name="substring-after-vowel" select="replace($TEXT, '^[^AEIOU]*[AEIOU](.*)$', '$1')" as="xs:string*"/> <xsl:sequence select="$substring-after-vowel"/> <xsl:choose> <xsl:when test="not(matches($substring-after-vowel,'[AEIOU]'))"/> <xsl:otherwise> <xsl:sequence select="f:remove-up-to-vowel($substring-after-vowel)"/> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:function> <xsl:template match="/*"> <xsl:variable name="result" select="f:remove-up-to-vowel('THEN THE CURTAIN FELL')" as="xs:string*"/> <xsl:for-each select="$result"> <xsl:message> <xsl:value-of select="."/> </xsl:message> </xsl:for-each> </xsl:template> </xsl:stylesheet>
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Flatten DITA conrefs, Paul Tyson phtyson@x | Thread | Re: [xsl] I desire this function: s, G. Ken Holman g.ken. |
Re: [xsl] Flatten DITA conrefs, rick@xxxxxxxxxxxxxx | Date | Re: [xsl] I desire this function: s, G. Ken Holman g.ken. |
Month |