Subject: RE: normalize as part of a 'select-distinct' in a for-each? From: "Clark C. Evans" <clark.evans@xxxxxxxxxxxxxxxxxxxx> Date: Sun, 3 Oct 1999 20:43:19 -0400 (EDT) |
Mike, This is one of those "classic" explainations. Nice. Questions: #1 Would it make sense to ammend the definition of "normalize" -- overloading it for node-set so that it applies iteratively over its members? It's funny, but I had tried this syntax before moving on to the one with correct syntax: //c[not(normalize(text())=following::c/normalize(text())))] #2 The xpath definition of equality of node-sets is very surprising to me. I would have not guessed that "A = B" means "((A intersect B) is not-empty" ... Perhaps a member-of operator would be more clear? //c[ not( text() member-of following::c/text() ) ] #3 Obscurity aside, a "select-distinct" would be very useful; although I understand the difficulty since these node-sets are iterators on the underlying nodes and not like a database query. ... In any case, taking your advice, I'm going to modify the stylesheet to "validate" first, checking for exceptional cases: <xsl:for-each select="//c[not(text()=normalize(text())]" > <error> <c><xsl:value-of select="." /></c> has embedded whitespace! </error> <xsl:variable name="halt" select="true()" /> </xsl:for-each> Thank you tons! Clark On Sun, 3 Oct 1999, Mike Brown wrote: > > I modified David Carlisle's example (FAQ 2.4) to use > > normalize() since whitespace distinctions are not desired. > > However, when I add normalize(), the stylesheet stops > > returning the expected "XYZ" and instead gives "XXXXYZZ" > > What am I doing wrong here? > > I can explain why you're getting these results, but I don't have a way to > solve your problem. What you are doing wrong is trying to normalize up to 9 > text nodes at a time. > > http://www.w3.org/TR/xpath#axes: "the following axis contains all nodes in > the same document as the context node that are after the context node in > document order, excluding any descendants and excluding attribute nodes and > namespace nodes." > > First, why do you get 'XYZ' without the attempt at normalization? > > //c[not(text()=following::c/text())] > > //c will test all the "c" element nodes in document order. Only those for > which [...] is true will be selected. The sort order you specified will be > applied to these selected nodes for purposes of iterating through your > xsl:for-each. > > For each node being tested, text() is a node-set with just one member: the > 'X', 'Y', or 'Z' text node child, as expected. following::c/text() is a > node-set with every text node child of every "c" element node from that > point in the document onward, (not counting descendants of the node being > tested). > > http://www.w3.org/TR/xpath#booleans: "If both objects to be compared are > node-sets, then the comparison will be true if and only if there is a node > in the first node-set and a node in the second node-set such that the result > of performing the comparison on the string-values of the two nodes is true" > > http://www.w3.org/TR/xpath#section-Text-Nodes: "The string-value of a text > node is the character data" > > So then, is going through the //c elements, is the "text()" node-set equal > to the "following::c/text()" node-set? The answer, in the fourth column, is > true (i.e., yes, they are equal) if the item in the second column **can be > found in** the third. > > //c: text(): following::c/text(): result: > <c>X</c> 'X' 'Y','X','Z','Z','Z','X','Z','X','X' true > <c>Y</c> 'Y' 'X','Z','Z','Z','X','Z','X','X' false > <c>X</c> 'X' 'Z','Z','Z','X','Z','X','X' true > <c>Z</c> 'Z' 'Z','Z','X','Z','X','X' true > <c>Z</c> 'Z' 'Z','X','Z','X','X' true > <c>Z</c> 'Z' 'X','Z','X','X' true > <c>X</c> 'X' 'Z','X','X' true > <c>Z</c> 'Z' 'X','X' false > <c>X</c> 'X' 'X' true > <c>X</c> 'X' (empty) false > > Therefore, //c[not(text()=following::c/text())] will select the //c items > that are not true, which just happened to be these elements: > <c>Y</c> > <c>Z</c> > <c>X</c> > ...which you then sorted in ascending order and looked at the string values > of to produce 'XYZ'. > > Second, why did you get 'XXXXYZZ' when you applied normalize() to the > node-sets in the second and third columns? > > http://www.w3.org/TR/xpath#section-String-Functions: "The normalize function > returns the argument string with white space normalized ..." [and] "A > node-set is converted to a string by returning the string-value of the node > in the node-set that is first in document order. If the node-set is empty, > an empty string is returned." > > //c: text(): following::c/text(): result: > <c>X</c> 'X' 'Y' (and others) false > <c>Y</c> 'Y' 'X' (and others) false > <c>X</c> 'X' 'Z' (and others) false > <c>Z</c> 'Z' 'Z' (and others) true > <c>Z</c> 'Z' 'Z' (and others) true > <c>Z</c> 'Z' 'X' (and others) false > <c>X</c> 'X' 'Z' (and others) false > <c>Z</c> 'Z' 'X' (and others) false > <c>X</c> 'X' 'X' true > <c>X</c> 'X' (empty) false > > Thus, //c[not(normalize(text())=normalize(following::c/text()))] selects: > <c>X</c> > <c>Y</c> > <c>X</c> > <c>Z</c> > <c>X</c> > <c>Z</c> > <c>X</c> > ...which, when sorted and so on produces 'XXXXYZZ'. > > > The solution is a little beyond me, though. I'd assume that you'd have to do > it with recursive template calls that mimic the XPath evaluation above, but > with normalize() thrown in. It wouldn't be efficient at all. Why don't you > just normalize your source data first :) > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: normalize as part of a 'select-, Mike Brown | Thread | RE: normalize as part of a 'select-, Kay Michael |
RE: normalize as part of a 'select-, Mike Brown | Date | Article in LinuxWorld, uche . ogbuji |
Month |