Re: [xsl] How to tokenize a string that contains space-delimited tokens and a quoted string that must not be tokenized?

Subject: Re: [xsl] How to tokenize a string that contains space-delimited tokens and a quoted string that must not be tokenized?
From: "David Carlisle d.p.carlisle@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 28 Nov 2022 14:48:37 -0000
so long as you don't need  " aaaa \" bbb"  or other quoted forms...

<xsl:stylesheet version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>

 <xsl:variable name="s">if machine = "Intel 386 or later processors and
compatible processors" then ground</xsl:variable>

 <xsl:template name="m">
  <xsl:analyze-string select="$s" regex='"([^"]*)"|([^ ]+)'>
   <xsl:matching-substring>
    <xsl:value-of select="'&#10;',regex-group(1),regex-group(2)"
separator=""/>
   </xsl:matching-substring>
  </xsl:analyze-string>
 </xsl:template>
</xsl:stylesheet>


$ saxon10 -it:m rc2.xsl
<?xml version="1.0" encoding="UTF-8"?>
if
machine
=
Intel 386 or later processors and compatible processors
then
ground


David


On Mon, 28 Nov 2022 at 14:21, Roger L Costello costello@xxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Hi Folks,
>
> I want to tokenize this string:
>
> if machine = "Intel 386 or later processors and compatible processors"
> then ground
>
> into this sequence of tokens:
>
> if
> machine
> =
> Intel 386 or later processors and compatible processors
> then
> ground
>
> Unfortunately, this:
>
> tokenize(.,'\s+')
>
> does not do the desired tokenization, as it also breaks up:
>
> "Intel 386 or later processors and compatible processors"
>
> into pieces.
>
> Nor does this do the desired tokenization:
>
> tokenize(.,'(\s+)|(")')
>
> Is there a simple way in XSLT/XPath to tokenize the string into the
> desired sequence of tokens?
>
> /Roger

Current Thread