RE: normalize as part of a 'select-distinct' in a for-each?

Subject: RE: normalize as part of a 'select-distinct' in a for-each?
From: "Clark C. Evans" <clark.evans@xxxxxxxxxxxxxxxxxxxx>
Date: Sun, 3 Oct 1999 20:43:19 -0400 (EDT)
Mike,  This is one of those "classic" explainations.  Nice.
                     
Questions:

#1 Would it make sense to ammend the definition of
"normalize" -- overloading it for node-set so that it applies
iteratively over its members?  It's funny, but I had tried 
this syntax before moving on to the one with correct syntax:

 //c[not(normalize(text())=following::c/normalize(text())))]

#2 The xpath definition of equality of node-sets is 
very surprising to me.  I would have not guessed that
"A = B"  means "((A intersect B) is not-empty" ...
Perhaps a member-of operator would be more clear? 

//c[ not( text() member-of following::c/text() ) ]

#3 Obscurity aside, a "select-distinct" would be 
very useful; although I understand the difficulty
since these node-sets are iterators on the underlying
nodes and not like a database query.  

...

In any case, taking your advice, I'm going to modify the 
stylesheet to "validate" first, checking for exceptional cases:

<xsl:for-each select="//c[not(text()=normalize(text())]" >
  <error>
    <c><xsl:value-of select="." /></c> has embedded whitespace!
  </error>
  <xsl:variable name="halt" select="true()" />
</xsl:for-each>

Thank you tons!

Clark


On Sun, 3 Oct 1999, Mike Brown wrote:

> > I modified David Carlisle's example (FAQ 2.4) to use 
> > normalize() since whitespace distinctions are not desired.
> > However, when I add normalize(), the stylesheet stops 
> > returning the expected "XYZ" and instead gives "XXXXYZZ" 
> > What am I doing wrong here?
> 
> I can explain why you're getting these results, but I don't have a way to
> solve your problem. What you are doing wrong is trying to normalize up to 9
> text nodes at a time.
> 
> http://www.w3.org/TR/xpath#axes: "the following axis contains all nodes in
> the same document as the context node that are after the context node in
> document order, excluding any descendants and excluding attribute nodes and
> namespace nodes."
> 
> First, why do you get 'XYZ' without the attempt at normalization?
> 
> 	//c[not(text()=following::c/text())]
> 
> //c will test all the "c" element nodes in document order. Only those for
> which [...] is true will be selected. The sort order you specified will be
> applied to these selected nodes for purposes of iterating through your
> xsl:for-each.
> 
> For each node being tested, text() is a node-set with just one member: the
> 'X', 'Y', or 'Z' text node child, as expected. following::c/text() is a
> node-set with every text node child of every "c" element node from that
> point in the document onward, (not counting descendants of the node being
> tested).
> 
> http://www.w3.org/TR/xpath#booleans: "If both objects to be compared are
> node-sets, then the comparison will be true if and only if there is a node
> in the first node-set and a node in the second node-set such that the result
> of performing the comparison on the string-values of the two nodes is true"
> 
> http://www.w3.org/TR/xpath#section-Text-Nodes: "The string-value of a text
> node is the character data"
> 
> So then, is going through the //c elements, is the "text()" node-set equal
> to the "following::c/text()" node-set? The answer, in the fourth column, is
> true (i.e., yes, they are equal) if the item in the second column **can be
> found in** the third.
> 
> //c:     	text():	following::c/text():               	result:
> <c>X</c>	'X'    	'Y','X','Z','Z','Z','X','Z','X','X'	true
> <c>Y</c>	'Y'    	'X','Z','Z','Z','X','Z','X','X'    	false
> <c>X</c>	'X'    	'Z','Z','Z','X','Z','X','X'        	true
> <c>Z</c>	'Z'    	'Z','Z','X','Z','X','X'            	true
> <c>Z</c>	'Z'    	'Z','X','Z','X','X'                	true
> <c>Z</c>	'Z'    	'X','Z','X','X'                    	true
> <c>X</c>	'X'    	'Z','X','X'                       	true
> <c>Z</c>	'Z'    	'X','X'                            	false
> <c>X</c>	'X'    	'X'                                	true
> <c>X</c>	'X'    	(empty)                             false
> 
> Therefore, //c[not(text()=following::c/text())] will select the //c items
> that are not true, which just happened to be these elements:
> 	<c>Y</c>
> 	<c>Z</c>
> 	<c>X</c>
> ...which you then sorted in ascending order and looked at the string values
> of to produce 'XYZ'.
> 
> Second, why did you get 'XXXXYZZ' when you applied normalize() to the
> node-sets in the second and third columns?
> 
> http://www.w3.org/TR/xpath#section-String-Functions: "The normalize function
> returns the argument string with white space normalized ..." [and] "A
> node-set is converted to a string by returning the string-value of the node
> in the node-set that is first in document order. If the node-set is empty,
> an empty string is returned."
> 
> //c:     	text():	following::c/text():	result:
> <c>X</c>	'X'    	'Y' (and others)    	false
> <c>Y</c>	'Y'    	'X' (and others)    	false
> <c>X</c>	'X'    	'Z' (and others)    	false
> <c>Z</c>	'Z'    	'Z' (and others)    	true
> <c>Z</c>	'Z'    	'Z' (and others)    	true
> <c>Z</c>	'Z'    	'X' (and others)    	false
> <c>X</c>	'X'    	'Z' (and others)    	false
> <c>Z</c>	'Z'    	'X' (and others)    	false
> <c>X</c>	'X'    	'X'                 	true
> <c>X</c>	'X'    	(empty)             	false
> 
> Thus, //c[not(normalize(text())=normalize(following::c/text()))] selects:
> 	<c>X</c>
> 	<c>Y</c>
> 	<c>X</c>
> 	<c>Z</c>
> 	<c>X</c>
> 	<c>Z</c>
> 	<c>X</c>
> ...which, when sorted and so on produces 'XXXXYZZ'.
> 
> 
> The solution is a little beyond me, though. I'd assume that you'd have to do
> it with recursive template calls that mimic the XPath evaluation above, but
> with normalize() thrown in. It wouldn't be efficient at all. Why don't you
> just normalize your source data first :)
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread