Subject: Re: [xsl] tokenize() and regex-group ? From: Matthieu Ricaud-Dussarget <matthieu.ricaud@xxxxxxxxx> Date: Tue, 17 Jul 2012 16:54:13 +0200 |
Regards, Matthieu.
You need to use xsl:analyze-string. I don't understand the difficulties in using this inside a recursive template. xsl:analyze-string can do everything that tokenize can do; you could implement tokenize as
<xsl:function name="fn:tokenize" as="xs:string"> <xsl:param name="in" as="xs:string"/> <xsl:param name="regex" as="xs:string"/> <xsl:analyze-string select="$in" regex="{$regex}"/> <xsl:matching-substring/> <xsl:non-matching-substring> <xsl:sequence select="."/> </xsl:non-matching-substring> </xsl:function>
Start be replacing your call to tokenize with a call to that function, then add whatever functionality you need.
Michael Kay Saxonica
On 17/07/2012 14:02, Matthieu Ricaud-Dussarget wrote:Hi all,
I'm tokenizing some text within a reccursiv template. The goal is to generates some linking with some "definitions" inside the doc.
Let say my text is : "my foo bar"
=> 1st level of reccursion is searching for "bar" as defined anchor in the doc
if not found, I increase a $lookBacklevel param :
=> 2nd level of reccursion is searching for "foo bar"
and so on... till it finds a matching definition or throw an error if not.
=> when a definition is found, the text is output with a link :
<p>... my <link idref="#anchorFooBar">foo bar</link> ...</p>
To do so I (space-) tokenized the text :
<xsl:variable name="tokenText" select="tokenize($text,' ')" as="xs:string*"/>
and then make 2 strings depending on reccursion param $lookBacklevel
<xsl:variable name="textBegin" select="string-join($tokenText[position() lt ($tokenNum - $lookBacklevel + 1)],' ')"/>
<xsl:variable name="textEnd" select="string-join($tokenText[position() ge ($tokenNum - $lookBacklevel + 1)],' ')"/>
I then search for a matching definition :
<xsl:variable name="matchingAncres" select="$ancres[normalize-space($textEnd)!=''][igs:match-ancre(.,$textEnd)]" as="element()*"/>
(matching rules are defined in a specific function)
The problem I've got is that the tokenize separator is too specific, it's only a space, and sometime words are separated by other char like :
- unbreakable space " "
- open parenthese "("
- french quotes "B+"
- ...
I could use a regex like "[\s(]B+" as 2nd arg of tokenize() but, I will then not be able to reconstruct the string.
So is there a way to get the separator that has been match in the regex of tokenize() ?
just like regex-group() do when using <xsl:analyze-string> ?
I think the answer is "no", but maybe I'm missing a trick to achieve this ?
I could maybe use <xsl:analyse-string> but this is not so easy because of the reccursiv template, the regex will depend on $lookBacklevel param. I'm not sure I can fin the good pattern...
Regards,
Matthieu.
-- Matthieu Ricaud 05 45 37 08 90 IGS-CP, service livres numC)riques
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] tokenize() and regex-grou, Michael Kay | Thread | Re: [xsl] tokenize() and regex-grou, Matthieu Ricaud-Duss |
[xsl] How to get a multiple TOC HTM, team wise | Date | [xsl] how to workaround restriction, Robby Pelssers |
Month |