RE: normalize as part of a 'select-distinct' in a for-each?

> I modified David Carlisle's example (FAQ 2.4) to use 
> normalize() since whitespace distinctions are not desired.
> However, when I add normalize(), the stylesheet stops 
> returning the expected "XYZ" and instead gives "XXXXYZZ" 
> What am I doing wrong here?

I can explain why you're getting these results, but I don't have a way to
solve your problem. What you are doing wrong is trying to normalize up to 9
text nodes at a time.

http://www.w3.org/TR/xpath#axes: "the following axis contains all nodes in
the same document as the context node that are after the context node in
document order, excluding any descendants and excluding attribute nodes and
namespace nodes."

First, why do you get 'XYZ' without the attempt at normalization?

	//c[not(text()=following::c/text())]

//c will test all the "c" element nodes in document order. Only those for
which [...] is true will be selected. The sort order you specified will be
applied to these selected nodes for purposes of iterating through your
xsl:for-each.

For each node being tested, text() is a node-set with just one member: the
'X', 'Y', or 'Z' text node child, as expected. following::c/text() is a
node-set with every text node child of every "c" element node from that
point in the document onward, (not counting descendants of the node being
tested).

http://www.w3.org/TR/xpath#booleans: "If both objects to be compared are
node-sets, then the comparison will be true if and only if there is a node
in the first node-set and a node in the second node-set such that the result
of performing the comparison on the string-values of the two nodes is true"

http://www.w3.org/TR/xpath#section-Text-Nodes: "The string-value of a text
node is the character data"

So then, is going through the //c elements, is the "text()" node-set equal
to the "following::c/text()" node-set? The answer, in the fourth column, is
true (i.e., yes, they are equal) if the item in the second column **can be
found in** the third.

//c:     	text():	following::c/text():               	result:
<c>X</c>	'X'    	'Y','X','Z','Z','Z','X','Z','X','X'	true
<c>Y</c>	'Y'    	'X','Z','Z','Z','X','Z','X','X'    	false
<c>X</c>	'X'    	'Z','Z','Z','X','Z','X','X'        	true
<c>Z</c>	'Z'    	'Z','Z','X','Z','X','X'            	true
<c>Z</c>	'Z'    	'Z','X','Z','X','X'                	true
<c>Z</c>	'Z'    	'X','Z','X','X'                    	true
<c>X</c>	'X'    	'Z','X','X'                       	true
<c>Z</c>	'Z'    	'X','X'                            	false
<c>X</c>	'X'    	'X'                                	true
<c>X</c>	'X'    	(empty)                             false

Therefore, //c[not(text()=following::c/text())] will select the //c items
that are not true, which just happened to be these elements:
	<c>Y</c>
	<c>Z</c>
	<c>X</c>
...which you then sorted in ascending order and looked at the string values
of to produce 'XYZ'.

Second, why did you get 'XXXXYZZ' when you applied normalize() to the
node-sets in the second and third columns?

http://www.w3.org/TR/xpath#section-String-Functions: "The normalize function
returns the argument string with white space normalized ..." [and] "A
node-set is converted to a string by returning the string-value of the node
in the node-set that is first in document order. If the node-set is empty,
an empty string is returned."

//c:     	text():	following::c/text():	result:
<c>X</c>	'X'    	'Y' (and others)    	false
<c>Y</c>	'Y'    	'X' (and others)    	false
<c>X</c>	'X'    	'Z' (and others)    	false
<c>Z</c>	'Z'    	'Z' (and others)    	true
<c>Z</c>	'Z'    	'Z' (and others)    	true
<c>Z</c>	'Z'    	'X' (and others)    	false
<c>X</c>	'X'    	'Z' (and others)    	false
<c>Z</c>	'Z'    	'X' (and others)    	false
<c>X</c>	'X'    	'X'                 	true
<c>X</c>	'X'    	(empty)             	false

Thus, //c[not(normalize(text())=normalize(following::c/text()))] selects:
	<c>X</c>
	<c>Y</c>
	<c>X</c>
	<c>Z</c>
	<c>X</c>
	<c>Z</c>
	<c>X</c>
...which, when sorted and so on produces 'XXXXYZZ'.


The solution is a little beyond me, though. I'd assume that you'd have to do
it with recursive template calls that mimic the XPath evaluation above, but
with normalize() thrown in. It wouldn't be efficient at all. Why don't you
just normalize your source data first :)


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
<- Previous	Index	Next ->
Re: normalize as part of a 'select-, David Carlisle	Thread	RE: normalize as part of a 'select-, Clark C. Evans
RE: meaning of "contain" in the XSL, Mike Brown	Date	RE: normalize as part of a 'select-, Clark C. Evans
	Month
<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home