Re: [xsl] xsl:analyze-string problem

Subject: Re: [xsl] xsl:analyze-string problem
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Thu, 08 Feb 2007 18:16:21 +0100
Yves Forkl wrote:
Hi XSLT 2.0 wizards,

while the syntax and semantics of xsl:analyze-string have become clear to me, I am now in search of an idiom implying it which it could help me solve this problem. (Or maybe of an alternative...)

In the input I find elements like these:

1) <e> def ghi</e>
2) <e> abc 22 def 3 ghi 1. </e>
3) <e> 2. </e>
4) <e> 3. def 35 78 ghi </e>

The possible contents fit into exactly 4 classes:

1) just some words and/or numbers
2) like 1), but followed by a number and a period
3) just a number and a period
4) like 3), but followed by some words and/or numbers

I understand that a number and period can only appear once at the end or beginning. Other numbers do not (never) have a period imm. following them. And between the number and the spaces there is nothing.


In each case, spaces may or may not appear at beginning and end of the content and must be preserved (no matter to which group they get attached).

The problem consists of replacing the original "e" element by creating new elements according to these rules:

A) A number followed by a period goes into a "ordinal" element.

This will become the xsl:matching-substring part.


B) Words and numbers go into a "text" element.

This is in the xsl:non-matching-substring.


C) In cases 1) and 4), where words and numbers appear at the end, the content of the current "e" element must be concatenated with all adjacent "e" elements of type 1) and 2) before putting it all into the "text" element.

I assume by concatenation, you mean text-concatenation, and not something else, like sibling nodes?


By contrast, in cases 2) and 3) which are ended by a number and a period the contents of the following "e" instance should never be appended.

If I understand it well, this should result in <text>...</text> blocks that each contain one <ordinal> element at the beginning or the end.



What is not clear to me is:


- whether the regex actually suffices to match the rules

I'm not sure either, but I'd chosen a simpler rule



- if it is a good idea to use xsl:for-each there

I think: no, though i do find it original




- how to assure concatenation of all the "e" instances' contents in cases 1) and 4) without processing them repeatedly - i.e.: how can I restrict the call to xsl:apply-templates to cases 2) and 3)?

I think, you should make it much easier on yourself. Here's an approach you can try:


<xsl:template match="/">
<xsl:variable name="parsed">
<xsl:apply-templates select="$data/e" />
</xsl:variable>
<xsl:copy-of select="$parsed" />
</xsl:template>
<xsl:template match="e">
<!-- match beginning/ending with ordinal in $1 or $2 -->
<xsl:analyze-string select="." regex="^(\s*\d+\.)|(\d+\.\s*)$">
<xsl:matching-substring>
<ordinal
start="{('yes')[regex-group(1)]}"
end="{('yes')[regex-group(2)]}">
<xsl:value-of select="." />
</ordinal>
</xsl:matching-substring>
<xsl:non-matching-substring>
<text><xsl:value-of select="." /></text>
</xsl:non-matching-substring>
</xsl:analyze-string> </xsl:template>



This will output (with $data set to your data), the following


<text> def ghi</text>
<text> abc 22 def 3 ghi </text>
<ordinal start="" end="yes">1. </ordinal>
<ordinal start="yes" end=""> 2.</ordinal>
<text> </text>
<ordinal start="yes" end=""> 3.</ordinal>
<text> def 35 78 ghi </text>


This is a temporary result tree. You can re-apply it on the place where I just placed a xsl:copy-of command. If 'start' is 'yes', it means: the ordinal was at the beginning of an 'e' element. If attribute 'end' is 'yes', it means that the ordinal appeared at the end of a string in an 'e' element. As you can see, the spaces are preserved.


All you need to do is gather the preceding-sibling that you need, based on your demands of concatenation rules.

Good luck coding!

Cheers,
-- Abel

Current Thread