Subject: Re: [xsl] I desire this function: substring-before(string, regex charset) From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 13 Apr 2025 16:04:04 -0000 |
Hi Roger, > Below is my XSLT solution. It uses the replace idea that Liam provided a few weeks back, which is neat. > Whereas the SNOBOL solution takes only 2 lines of code, the XSLT solution requires many lines of code. > Is there a simpler, shorter solution? Here is a single XPath 4 (I am surprised scan-left isn't standard in XPath 3.1 ? !!!): let $input := 'THEN THE CURTAIN FELL', $inChars := tokenize($input, '') return distinct-values(scan-left($inChars, $input, function($s, $char) { if(contains('AEIOU', $char)) then substring-after($s, $char) else $s } ) ) One could try it here: https://fiddle.basex.org/?share=%28%27query%21%27+let0M5%22THEN+THE+CURTAIN+F ELL%22%2C3*0D5tokenize28%22%22G+return3*dist9ct-values%7Bscan-left2DJ8functio nS*7%28+if%7BKa9s%7B%22AEIOU%221%7D+then+substr9g-afterSP+else0s.P%29.7G44+6% 21%27%27%7Emode%21%27XQuery+%7BBaseX6Type%21%27xml%27%29*7+.34440+%241Jchar%7 D2%7B%243%5Cn4PP5+%3A%3D+6%7D%27%7EKext7++8M%2C+9inD9CharsG%7D3J%2C0KcontM9pu tP**S2s1.%01SPMKJGD9876543210.*_ Thanks, Dimitre. On Sun, Apr 13, 2025 at 5:42b/AM Roger L Costello costello@xxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > Hi Folks, > > The XPath substring-before function returns "that part of the given input > string that occurs before the first occurrence of the string given in > $arg2." [definition from SAXON web page] > > substring-before($arg1 as xs:string?, $arg2 as xs:string?) --> xs:string > > It's a shame that the value of $arg2 can't be a regex character set, e.g., > > substring-before("THEN THE CURTAIN FELL", '[AEIOU]') > > returns TH. > > Even better, it would be nice if there was a third argument which > specified that you also want the character that was matched from the > character set: > > substring-before("THEN THE CURTAIN FELL", '[AEIOU]', 'plus matched charset > character') > > returns THE. > > I believe such a function would be useful. > > SNOBOL has such a function. > > Let's see how such functionality could be used. I have this text: > > THEN THE CURTAIN FELL > > Fetch the string preceding the first vowel, plus the vowel: > > THE > > However, instead of fetching the string plus vowel, modify the text by > nullifying the string plus vowel: > > N THE CURTAIN FELL > > Repeat on the new, shortened text. > > Here is the text as it is repeatedly shortened: > > THEN THE CURTAIN FELL > N THE CURTAIN FELL > CURTAIN FELL > RTAIN FELL > IN FELL > N FELL > LL > > General Problem Statement: There is a text string. There is a character > set. Strip off the string prior to the first occurrence of a character from > the character set, plus the character. Repeat until the end of text is > reached. > > Below I show how to implement this in SNOBOL and then in XSLT. My XSLT > solution is large and complex. Is there a simpler, shorter solution? > > First, the SNOBOL solution: > > Assign the variable TEXT a string: > > TEXT = "THEN THE CURTAIN FELL" > > BREAK is a built-in SNOBOL function. It has one argument, which is a > character set. BREAK returns a pattern that matches a string up to but not > including the character from the character set. E.g., > > BREAK("AEIOU") > > returns a pattern that matches characters up to but not including a vowel. > This pattern: > > BREAK("AEIOU") LEN(1) > > matches characters up to a vowel, plus the vowel. > > Note: LEN(N) means, match any N-length character string. It is SNOBOL's > version of the regex .{N} > > The following statement applies the pattern to TEXT, replacing the string > plus vowel with null: > > TEXT BREAK("AEIOU") LEN(1) = > > To incrementally strip away the string, put the statement inside a loop: > > LOOP TEXT BREAK("AEIOU") LEN(1) = :F(END) > > OUTPUT = TEXT :(LOOP) > > Here is the output from running the SNOBOL program: > > THEN THE CURTAIN FELL > N THE CURTAIN FELL > CURTAIN FELL > RTAIN FELL > IN FELL > N FELL > LL > > Nice. > > Below is my XSLT solution. It uses the replace idea that Liam provided a > few weeks back, which is neat. Whereas the SNOBOL solution takes only 2 > lines of code, the XSLT solution requires many lines of code. Is there a > simpler, shorter solution? > > Lesson Learned: when designing a new language, it might be useful for the > language to provide something like the SNOBOL BREAK function. > > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > xmlns:xs="http://www.w3.org/2001/XMLSchema" > xmlns:f="function" > exclude-result-prefixes="#all" > version="3.0"> > > <xsl:function name="f:remove-up-to-vowel" as="xs:string*"> > <xsl:param name="TEXT" as="xs:string"/> > <xsl:choose> > <!-- end of string? --> > <xsl:when test="$TEXT eq ''"/> > <xsl:otherwise> > <xsl:variable name="substring-after-vowel" > select="replace($TEXT, > '^[^AEIOU]*[AEIOU](.*)$', '$1')" > as="xs:string*"/> > <xsl:sequence select="$substring-after-vowel"/> > <xsl:choose> > <xsl:when > test="not(matches($substring-after-vowel,'[AEIOU]'))"/> > <xsl:otherwise> > <xsl:sequence > select="f:remove-up-to-vowel($substring-after-vowel)"/> > </xsl:otherwise> > </xsl:choose> > </xsl:otherwise> > </xsl:choose> > </xsl:function> > > <xsl:template match="/*"> > <xsl:variable name="result" > select="f:remove-up-to-vowel('THEN THE CURTAIN FELL')" > as="xs:string*"/> > <xsl:for-each select="$result"> > <xsl:message> > <xsl:value-of select="."/> > </xsl:message> > </xsl:for-each> > </xsl:template> > > </xsl:stylesheet>
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] I desire this function: s, Martin Honnen martin | Thread | Re: [xsl] I desire this function: s, Dimitre Novatchev dn |
Re: [xsl] I desire this function: s, Martin Honnen martin | Date | Re: [xsl] I desire this function: s, Dimitre Novatchev dn |
Month |