Re: [xsl] split string to whole words based on length

Subject: Re: [xsl] split string to whole words based on length
From: "andrew welch" <andrew.j.welch@xxxxxxxxx>
Date: Thu, 27 Apr 2006 11:59:55 +0100
On 4/27/06, David Carlisle <davidc@xxxxxxxxx> wrote:
>
> Assuming that you do just want to concatenate these, then the reason the
> last group was being split was that I was looking for a trailing comma,
> so if you add a comma to the end of the list so it looks like
>
> <xsl:variable name="str"
select="'aaaaaaaaaaaaaaaaaaaa,aaaaaaaaaaaaaaaaaaaaa,aaaaaaaaaaaaaaaaaaaaa,aaa
aaaaaaaaaaaaaaaaaaaaaaaaa,aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaa,aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,aaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,'" as="xs:string"
xmlns:xs="http://www.w3.org/2001/XMLSchema"/>
> <xsl:variable name="length" select="255" as="xs:integer"
xmlns:xs="http://www.w3.org/2001/XMLSchema"/>
>
>
> Then the following works (just added [string()] to get rid of the empty
> token that (now) comes after the trailing comma)
>
> <xsl:for-each
select="tokenize(replace($str,concat('(.{0,',$length,'}),'),'$1!'),'!')[strin
g()]">
>     <words><xsl:value-of select="."/></words>
> </xsl:for-each>
>
>
> Of course if your input is really a set of word elements and not a
> comma separated list there is no point in joining then up and splitting
> them, really.

Hmm, my bad on this one.  The input starts life as one word per
element, but gets joined into a single element using string-join()
with a comma delimiter.  A request then came in to limit the string
length to a maximum of 255 and start a new element for anything
longer.  Thinking there would be a simple 2.0 solution I concocted an
example and used length 10, but when implementing the suggested
solutions it was failing for a larger input sample and length 255.  I
made up another input sample (which I would string-join()) and posted
that back but forgot to mention the 255 limit :-/

Anyway, to cut a long story short, as you say I can work on the
original elements to limit it to 255 which is easier and more straight
forward that if in the input is one long string.

Having said all that (which makes it now just an exercise) this input:

<words>
<word>aaaaaaaaaaaaaaaaaaaa</word>
<word>aaaaaaaaaaaaaaaaaaaaa</word>
<word>aaaaaaaaaaaaaaaaaaaaa</word>
<word>aaaaaaaaaaaaaaaaaaaaaaaaaaaa</word>
<word>aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</word>
<word>aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</word>
<word>aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</word>
<word>aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</word>
</words>

Whichs gets string-join()'d into one long string, doesn't (I believe)
produce the required output with your latest offering.  It creatres
three output elements when only two are needed.

I've kept the input as separate elements and string-join() them in the
stylesheet purely for manageability.  The goal is still to split a
comma delimited list, rather than construct one from the elements
(which I've now done).

Here's a stylesheet containing your three suggestions:

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:xs="http://www.w3.org/2001/XMLSchema";
version="2.0">

<xsl:variable name="str" select="string-join(//word, ',')" as="xs:string"/>
<xsl:variable name="length" select="255" as="xs:integer"/>

<xsl:output indent="yes"/>

<xsl:template match="/">
	<xsl:call-template name="z"/>
</xsl:template>

<xsl:template name="x">
	<xsl:analyze-string select="$str" regex=".{{0,{$length}}},">
	 <xsl:matching-substring>
	   <words><xsl:value-of select="substring(.,1,string-length(.)-1)"/></words>
	 </xsl:matching-substring>
	 <xsl:non-matching-substring>
	   <words><xsl:value-of select="."/></words>
	 </xsl:non-matching-substring>
	</xsl:analyze-string>
</xsl:template>

<xsl:template name="y">
	<xsl:for-each
select="tokenize(replace($str,concat('(.{0,',$length,'}),'),'$1!'),'!')">
	   <words><xsl:value-of select="."/></words>
	</xsl:for-each>
</xsl:template>

<xsl:template name="z">
<xsl:for-each
select="tokenize(replace($str,concat('(.{0,',$length,'}),'),'$1!'),'!')[strin
g()]">
   <words><xsl:value-of select="."/></words>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

As I say it's all academic now

cheers
andrew

Current Thread