Subject: RE: [xsl] Higher-Order Functions in XPath 2.0 From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx> Date: Wed, 16 Jan 2002 15:27:45 -0000 |
Great stuff, Dimitre, please lob it over to www-xpath-comments. > -----Original Message----- > From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx > [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx]On Behalf Of Dimitre > Novatchev > Sent: 16 January 2002 10:07 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] Higher-Order Functions in XPath 2.0 > > > Hi, > > Bellow is a draft proposal for implementing higher-order functions in > XPath 2.0. > > I'd greatly appreciate receiving any comments about the > correctness and > validity of this text, as well as for possible improvements. > > Cheers, > Dimitre Novatchev. > ------------------------------------------------------------------- > > Higher-Order Functions in XPath 2.0 > > Contents > -------- > Part I. Problems with XPath 2.0 > > 1. Using the "for" expression for incremental processing > 2. Difficulties in returning aggregate (nodes) value > 3. Combining two sequences in producing a sequence > 4. Text processing beyond the limits of regular expressions > 5. XPath language complexity has grown considerably > 6. Inflexibility, where equality and comparison-returning > functions are needed > 7. Little or no reusability is possible for "for" expressions > > Part II. Higher-Order Functions Solutions > > III. Conclusion > > IV. Recommendation > > > > > Part I. Problems with XPath 2.0 > ------------------------------- > > A general problem with XPath 2.0, as specified by the current working > draft, is that some tasks exist, which cannot be solved within the > language, and another group of tasks can be accomplished in XPath 2.0, > but in a rather inefficient way. In all of these cases programmers can > solve the task by using recursive XSLT templates and this is > a powerful > method, but at the same writing and testing different recursive > templates for every particular case is a time-consuming and > error-prone > process and often the result has low reusability. > > Listed bellow are some examples of such problems: > > 1. Using the "for" expression for incremental processing > The "for" expression in XPath 2.0 cannot be used when the results > of processing every item of a sequence depend on the processing of the > previous items. Or in some cases it can be used, but in an obviously > inefficient way: > > - Find the product of all numbers in a sequence. > - Reverse a sequence > - Concat all string items of a sequence. > - Reverse a string or a sequence of items. > > A typical use-case, for which the XPath 2.0 solution is difficult and > inefficient, is the following: > > Given a sequence of "book" nodes, calculate and display the amount of > money received from the sales of each book (price * sales), but also > obtain and display a running total, as each book node from > the sequence > is processed. To achieve this in XPath, one would write: > > for $i in (1 to count($items)) > return ($items[$i], > sum( for $j in (sublist($items, 1, $i)) > return (@price * @sales) ) > > In the above expression, if N is the number of items in $items, sum() > is performed N * (N + 1) / 2 times. > > While the above may seem to be just a textbook example (and really can > be found in Mike Kays book), there are real-world examples, where a > running total must be calculated and even several results must be > accumulated in parallel. I am deeply obliged to Mark Nahabedian > (naha@xxxxxxxxxx ) for allowing me to quote his work which has to deal > with exactly this problem -- a complete example can be found at: > > http://www.ai.mit.edu/people/naha/itrack/about.html > > > 2. Difficulties in returning aggregate (nodes) value from a sequence, > especially when returning those nodes depends in a non-trivial way on > the other nodes of the sequence: > > - Obtain all nodes with "maximum value" from a sequence, especially > in the case when the node "value" is computed by a very complex > expression. > - Obtain the nodes with "distinct values" from a sequence, > especially in the case when the node "value" is computed by a very > complex way. > - There's no general way to "filter" elements of a > sequence based on > a predicate. > > The reason is that a predicate (function) cannot be passed as a > parameter to a general "filter" function. As a result programmers will > write multitude of similar filtering expressions, without > being able to > re-use them. Such repetitions are time-consuming, error-prone, and > generally result in un-maintainable and non-reusable code. > > Examples of problems in this group: > > - Return the sum of squares of the numbers in a sequence. > - Return all items in a sequence, for which f(item) has minimal > value > - For some function f() test whether all the values of f(item) on a > sequence are equal (> 0, etc.) > - For some function f() test whether all values f(item) on a > sequence are in increasing order. > > Although a solution can be found in XPath, it will be difficult and > inefficient. > > Also, for every different function f() another version of the same > solution will have to be produced, because functions cannot be passed > as parameters. > > 3. Combining two sequences in producing a sequence: > Given (a1, a2, ..., aN) and (b1, b2, ..., bN) compute: > > (a1 + b1, a2 + b2, ..., aN + bN) > > (a1 * b1, a2 * b2, ..., aN * bN) > > (a1 and b1, a2 and b2, ..., aN and bN) > > (a1 or b1, a2 or b2, ..., aN or bN) > etc. > > > 4. Text processing beyond the limits of regular expressions is not > possible. > > A real world problem was pointed out by David Carlisle -- he needs in > his work to match strings, surrounded by (unknown in advance number > of) balanced parenthesis. > > For any such problem, it would be nice to have a general, table-driven > parser() function. However this is not possible, because the parser() > function will need to be passed as parameter a lex() function that it > must call for obtaining the terminal symbols from the input text. > > 5. XPath language complexity has grown considerably and the language > cannot continue to expand indefinitely: > > Already there are hardly any spare characters left for operators. > Often there are (more than one) different ways of performing similar > tasks. > > In contrast, a language that supports higher-order functions can be > kept simple, small and elegant, while at the same time providing > powerful means to produce any necessary new functionality. > > Thus the "standard" language features (e.g. operators and functions) > can be kept to a minimum, while the language makes possible > desired new > functionality to be easily produced and accumulated into a library of > general and reusable useful functions. > > Without support for higher-order functions such libraries are very > limited in scope > and usefulness. > > 6. Inflexibility, where equality and comparison-returning > functions are > needed to be passed to: > - sort > - distinct-values > - grouping, etc. > > 7. Little or no reusability is possible for "for" expressions > > In the expression bellow: > > for $i in (1 to count($items)) > return expression > > expression cannot be reused (simply copied and pasted) and > will have to > be modified every time it is used with differently named > range variable > so that references to the range variable are renamed. > > In contrast, with higher-order functions support one can have a map() > function, so that in > > map(f, $sequence) > > the code of f() will never have to be modified. > > > Part II. Higher-Order Functions Solutions > ----------------------------------------- > > Provided higher-order functions were available, the problems listed > above have easy and natural solutions. > > To demonstrate the compactness and high degree of > readability, the code > bellow is written in Haskell. Haskell is used only for convenience, in > no way should it be inferred that the same syntax is recommended for > XPath 2.0. Some basic conventions from this language: > > f x y = x * y > > This defines a function f(x,y) = x * y > > [1, 2, 3] > > This is a list of elements 1, 2, and 3. The same (for our purposes) as > (1, 2, 3) in XPath 2.0. > > [] > > This denotes the empty list -- the same as () in XPath 2.0. > > x -- denotes the name of a single element/function. > > xs -- any name ending in 's' denotes a list of elements. > > Any operator can be used also as a function, when put in brackets. > Thus: > > (+) 1 2 = 1 + 2 = 3 > > The (:) operator is used to prepend an element to the start of a list: > > x : xs > > defines a list with head x and tail xs. > > The flip() function takes as argument any function with two arguments, > and produces as result a the same function, which takes these two > arguments in reverse order. > > flip f x y = f y x > > Primitive recursion over a list can be defined as follows: > > foldl f z [] = z > foldl f z (x:xs) = foldl f (f z x) xs > > The function "foldl" takes 3 arguments -- a function f(), > which takes 2 > arguments, a value z, and a list. > > This is one of the most general functions over lists. It traverses the > list from left to write, applying f() on each element and the > currently > accumulated result. > > There is a dual function (foldr), which behaves in a similar way, but > traverses the list from right to left: > > foldr f z [] = z > foldr f z (x:xs) = f x (foldr f z xs) > > > As can be easily seen: > > foldl (+) 0 xs > > is the sum of all elements in a list xs. Therefore we could write: > > sum xs = foldl (+) 0 xs > > Analogously: > > product xs = foldl (*) 1 xs > > And this one liner is the solution to one of the problems in Part I, > section 1. > > We can ommit the last operand(s) from an equation, in case it is the > same and we still get a valid function definition. Therefore, > the above > function definitions could be simplified even further and written as: > > sum = foldl (+) 0 > > product = foldl (*) 1 > > > Reversing the elements of a list (solution of another problem in Part > I, section 1.) is simply defined as: > > reverse = foldl (flip (:)) [] > > > Concatenating all elements of a list (solution of the next problem) is > simply: > > concat = foldr (++) [] > > where (++) is the concatenation operator for lists. > > > Combining two lists with equal length into one can be performed using > the zipWith() function: > > zipWith f (a:as) (b:bs) = f a b : zipWith f as bs > zipWith _ _ _ = [] > > The function f() is applied on every pair of elements at position N > from the two lists, and the result forms the element at position N in > the result list. > > ZipWith() solves directly all the problems from Part I, section 3: > > - (a1 + b1, a2 + b2, ..., aN + bN) is just: > > zipWith (+) as bs > > - (a1 * b1, a2 * b2, ..., aN * bN) is just: > > zipWith (*) as bs > > - (a1 and b1, a2 and b2, ..., aN and bN) is just: > > zipWith and as bs > > - (a1 or b1, a2 or b2, ..., aN or bN) is just: > > zipWith or as bs > > > > A very useful function is scanl: > > scanl f q xs = q : (case xs of > [] -> [] > x:xs -> scanl f (f q x) xs) > > It is similar to foldl, but creates a list, every element of which > contains the intermediate accumulated result. The first element of the > result-list is q. > > In case the list is guaranteed to be non-empty, then the following > function can be defined: > > scanl1 f (x:xs) = scanl f x xs > > It behaves like scanl(), but doesnt use a zero argument. > > As can be easily seen, > > scanl1 `op` xs > > produces a list of the intermediate accumulated results of performing > op() on the list xs. For example: > > scanl1 (+) [1, 2, 3] = [1, 3, 6] = [1, 1+2, 3 + 3] > > > scanl1() combined with zipWith() solve directly the problem of > calculating the running total from Part I, section 1: > > scanl1 (+) (zipWith (*) [1,2,3] [2,2,2]) > > returns: > [2, 6, 12] > > A direct solution to the filtering problems defined in Part I, section > 2, is provided by the function filter(): > > filter p xs = [ x | x <- xs, p x ] > > it takes a function p() defined on the type of elements of its second > argument - a list xs, and returns a list of those elements of xs, for > which p(x) = true. > > Using it, we can write: > > - Return all items in a sequence, for which f(item) has minimal value > > minvals f xs = filter (= fmin) ys > > where fmin = minimum ys, > ys = map f xs > > > - For some function f() test whether all the values of f(item) on a > sequence are equal (> 0, etc.) > > allFPositive f xs = foldl and ys > > where ys = map f xs > > - For some function f() test whether all values f(item) on a sequence > are in increasing order. > > allFIncreasing f xs = foldl and ys > > where ys = zipWith (<) (init xs) (tail xs) > > > In the last solution we used the init() function, which is the dual of > tail() - from a list xs it produces another, containing all > elements of > the first list , but the last one: > > init (x:xs) = x : init xs > > > > Finally, heres an example how to keep a language small and simple: > > Whenever a programmer needs a mapping operator, she could produce it > immediately herself, without having to ask a working group for > including it in the standard language, as follows: > > 1. She defines the function map(): > > map f xs = foldl ( (:) . f ) [ ] xs > > 2. Because she needs to apply the mapping operator repeatedly, for > convenience she defines a multiMap() function: > > multiMap xs fs = foldr map xs fs > > multiMap [1, 2, 3] [(1+), (2*), (5+)] > > produces: > > [13,15,17] , that is [(1 + 5)*2 + 1, (2 + 5)*2 + 1, (3 + > 5)*2 + 1] > > 3. The multiMap() function as defined above applies the functions > starting from the last one in the list. The programmer wants them > applied starting from the first function in the list. She re-defines > multiMap by changing foldr to foldl as follows: > > multiMap xs fs = foldl (flip map) xs fs > > > Now > multiMap [1, 2, 3] [(1+), (2*), (5+)] > > produces: > > [9, 11, 13] > > that is: [(1 + 1)*2 + 5, (2 + 1)*2 + 5, (3 + 1)*2 + 5] > > 4. She is ready to use the new function. For example, she specifies a > series of SVG coordinate transformations: > > multiMap ($coordinates, > ((. *2), > (if (position() mod 2) then . + 50 else .) > .. > ) > ) > > > > III. Conclusion > --------------- > > XPath 2.0 as specified in the current working draft has the > problems described in Part I. A language with support for higher-order > function is free of these problems. > > IV. Recommendation > ------------------ > > Based on the above conclusion, I recommend that higher-order > functions support be implemented in Xpath 2.0. > > > > __________________________________________________ > Do You Yahoo!? > Send FREE video emails in Yahoo! Mail! > http://promo.yahoo.com/videomail/ > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Higher-Order Functions in XPa, Dimitre Novatchev | Thread | Re: [xsl] Higher-Order Functions in, Jeni Tennison |
RE: [xsl] XSLT match with regex wha, Thomas Winkler | Date | Re: [xsl] how to make data fit in a, yan bai |
Month |