Re: [xsl] Algorithm for splitting a string at a space

Subject: Re: [xsl] Algorithm for splitting a string at a space
From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 23 Nov 2015 19:48:05 -0000
On Mon, Nov 23, 2015 at 07:04:32PM -0000, Rick Quatro rick@xxxxxxxxxxxxxx scripsit:
> I have a series of strings that I need to split if they are longer than a
> particular length, say 30 characters. But I need to make the split at the
> previous space. Here is an example string:
> 
> This is a long line that I want to split at a space.
> 
> The 30th character is in the middle of a word, so I need to do the split at
> the previous space. I am using XSLT/XPath 2.0. I am having trouble
> developing a good algorithm for this. Any pointers would be appreciated.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; xmlns:xs="http://www.w3.org/2001/XMLSchema"; xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl";
    exclude-result-prefixes="xs xd" version="2.0">
    <xsl:output method="text" />
    <xsl:variable name="input">I am a large string which needs to be broken at the last space on or before character thirty-one</xsl:variable>
    <xsl:template match="/">
        <xsl:variable name="cutLength" select="30" />
        <xsl:variable name="tokens" select="tokenize($input, '\p{Zs}')" />
        <!-- \p{Zs} because someone might have provided an unusual space -->
        <xsl:variable as="element(bucket)" name="candidates">
            <!-- we can't use one sequence for this and 2.0 hasn't got maps or arrays -->
            <bucket>
                <xsl:for-each select="1 to count($tokens)">
                    <candidate>
                        <xsl:value-of select="string-join($tokens[position() le current()], '&#x0020;')" />
                    </candidate>
                </xsl:for-each>
            </bucket>
        </xsl:variable>
        <xsl:value-of select="$candidates/candidate[string-length() le $cutLength][last()]" />
    </xsl:template>
</xsl:stylesheet>

returns 

"I am a large string which"

It's not as compact as the regexp solution from David Carlisle and it's asking a lot of the optimizer if it's a really, really big input line.  The pattern does generalize fairly well for making substrings from rules, rather than character positions.

-- Graydon

Current Thread