RE: [xsl] Higher-Order Functions in XPath 2.0

Subject: RE: [xsl] Higher-Order Functions in XPath 2.0
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Wed, 16 Jan 2002 15:27:45 -0000
Great stuff, Dimitre, please lob it over to www-xpath-comments.

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx]On Behalf Of Dimitre
> Novatchev
> Sent: 16 January 2002 10:07
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Higher-Order Functions in XPath 2.0
>
>
> Hi,
>
> Bellow is a draft proposal for implementing higher-order functions in
> XPath 2.0.
>
> I'd greatly appreciate receiving any comments about the
> correctness and
> validity of this text, as well as for possible improvements.
>
> Cheers,
> Dimitre Novatchev.
> -------------------------------------------------------------------
>
> Higher-Order Functions in XPath 2.0
>
> Contents
> --------
> Part I. Problems with XPath 2.0
>
>        1. Using the "for" expression for incremental processing
>        2. Difficulties in returning aggregate (nodes) value
>        3. Combining two sequences in producing a sequence
>        4. Text processing beyond the limits of regular expressions
>        5. XPath language complexity has grown considerably
>        6. Inflexibility, where equality and comparison-returning
>           functions are needed
>        7. Little or no reusability is possible for "for" expressions
>
> Part II. Higher-Order Functions Solutions
>
> III. Conclusion
>
> IV. Recommendation
>
>
>
>
> Part I. Problems with XPath 2.0
> -------------------------------
>
> A general problem with XPath 2.0, as specified by the current working
> draft, is that some tasks exist, which cannot be solved within the
> language, and another group of tasks can be accomplished in XPath 2.0,
> but in a rather inefficient way. In all of these cases programmers can
> solve the task by using recursive XSLT templates and this is
> a powerful
> method, but at the same writing and testing different recursive
> templates for every particular case is a time-consuming and
> error-prone
> process and often the result has low reusability.
>
> Listed bellow are some examples of such problems:
>
> 1. Using the "for" expression for incremental processing
>    The "for" expression in XPath 2.0  cannot be used when the results
> of processing every item of a sequence depend on the processing of the
> previous items. Or in some cases it can be used, but in an obviously
> inefficient way:
>
>    - Find the product of all numbers in a sequence.
>    - Reverse a sequence
>    - Concat all string items of a sequence.
>    - Reverse a string or a sequence of items.
>
> A typical use-case, for which the XPath 2.0 solution is difficult and
> inefficient, is the following:
>
> Given a sequence of "book" nodes, calculate and display the amount of
> money received from the sales of each book (price * sales), but also
> obtain and display a running total, as each book node from
> the sequence
> is processed. To achieve this in XPath, one would write:
>
> for $i in (1 to count($items))
>    return ($items[$i],
>            sum( for $j in (sublist($items, 1, $i))
>                    return (@price * @sales) )
>
> In the above expression, if N is the number of items in $items, sum()
> is performed N * (N + 1) / 2 times.
>
> While the above may seem to be just a textbook example (and really can
> be found in Mike Kays book), there are real-world examples, where a
> running total must be calculated and even several results must be
> accumulated in parallel. I am deeply obliged to Mark Nahabedian
> (naha@xxxxxxxxxx ) for allowing me to quote his work which has to deal
> with exactly this problem -- a complete example can be found at:
>
> http://www.ai.mit.edu/people/naha/itrack/about.html
>
>
> 2. Difficulties in returning aggregate (nodes) value from a sequence,
> especially when returning those nodes depends in a non-trivial way on
> the other nodes of the sequence:
>
>    - Obtain all nodes with "maximum value" from a sequence, especially
> in the case when the node "value" is computed by a very complex
> expression.
>    - Obtain the nodes with "distinct values" from a sequence,
> especially in the case when the node "value" is computed by a very
> complex way.
>    - There's no general way to "filter" elements of a
> sequence based on
> a predicate.
>
> The reason is that a predicate (function) cannot be passed as a
> parameter to a general "filter" function. As a result programmers will
> write multitude of similar filtering expressions, without
> being able to
> re-use them. Such repetitions are time-consuming, error-prone, and
> generally result in un-maintainable and non-reusable code.
>
> Examples of problems in this group:
>
>    - Return the sum of squares of the numbers in a sequence.
>    - Return all items in a sequence, for which f(item) has minimal
> value
>    - For some function f() test whether all the values of f(item) on a
> sequence  are equal (> 0, etc.)
>    - For some function f() test whether all values f(item) on a
> sequence are in increasing order.
>
> Although a solution can be found in XPath, it will be difficult and
> inefficient.
>
> Also, for every different function f() another version of the same
> solution will have to be produced, because functions cannot be passed
> as parameters.
>
> 3. Combining two sequences in producing a sequence:
> Given (a1, a2, ..., aN) and (b1, b2, ..., bN) compute:
>
>      (a1 + b1, a2 + b2, ..., aN + bN)
>
>      (a1 * b1, a2 * b2, ..., aN * bN)
>
>      (a1 and b1, a2 and b2, ..., aN and bN)
>
>      (a1 or b1, a2 or b2, ..., aN or bN)
> etc.
>
>
> 4. Text processing beyond the limits of regular expressions is not
> possible.
>
> A real world problem was pointed out by David Carlisle -- he needs in
> his work to match strings, surrounded by  (unknown in advance number
> of) balanced parenthesis.
>
> For any such problem, it would be nice to have a general, table-driven
> parser() function. However this is not possible, because the parser()
> function will need to be passed as parameter a lex() function that it
> must call for obtaining the terminal symbols from the input text.
>
> 5. XPath language complexity has grown considerably and the language
> cannot continue to expand indefinitely:
>
> Already there are hardly any spare characters left for operators.
> Often there are (more than one) different ways of performing similar
> tasks.
>
> In contrast, a language that supports higher-order functions can be
> kept simple, small and elegant, while at the same time providing
> powerful means to produce any necessary new functionality.
>
> Thus the "standard" language features (e.g. operators and functions)
> can be kept to a minimum, while the language makes possible
> desired new
> functionality to be easily produced and accumulated into a library of
> general and reusable useful functions.
>
> Without support for higher-order functions such libraries are very
> limited in scope
> and usefulness.
>
> 6. Inflexibility, where equality and comparison-returning
> functions are
> needed to be passed to:
>    - sort
>    - distinct-values
>    - grouping, etc.
>
> 7. Little or no reusability is possible for "for" expressions
>
> In the expression bellow:
>
>  for $i in (1 to count($items))
>          return expression
>
> expression cannot be reused (simply copied and pasted) and
> will have to
> be modified every time it is used with differently named
> range variable
> so that references to the range variable are renamed.
>
> In contrast, with higher-order functions support one can have a map()
> function, so that in
>
> map(f, $sequence)
>
> the code of f() will never have to be modified.
>
>
> Part II. Higher-Order Functions Solutions
> -----------------------------------------
>
> Provided higher-order functions were available, the problems listed
> above have easy and natural solutions.
>
> To demonstrate the compactness and high degree of
> readability, the code
> bellow is written in Haskell. Haskell is used only for convenience, in
> no way should it be inferred that the same syntax is recommended for
> XPath 2.0. Some basic conventions from this language:
>
>  f x y = x * y
>
> This defines a function f(x,y) = x * y
>
>  [1, 2, 3]
>
> This is a list of elements 1, 2, and 3. The same (for our purposes) as
> (1, 2, 3) in XPath 2.0.
>
>  []
>
> This denotes the empty list -- the same as () in XPath 2.0.
>
>  x   -- denotes the name of a single element/function.
>
>  xs -- any name ending in 's' denotes a list of elements.
>
> Any operator can be used also as a function, when put in brackets.
> Thus:
>
> (+) 1 2 = 1 + 2 = 3
>
> The (:) operator is used to prepend an element to the start of a list:
>
>  x : xs
>
> defines a list with head x and tail xs.
>
> The flip() function takes as argument any function with two arguments,
> and produces as result a the same function, which takes these two
> arguments in reverse order.
>
>  flip f x y = f y x
>
> Primitive recursion over a list can be defined as follows:
>
>  foldl f z []      = z
>  foldl f z (x:xs)  = foldl f (f z x) xs
>
> The function "foldl" takes 3 arguments -- a function f(),
> which takes 2
> arguments, a value z, and a list.
>
> This is one of the most general functions over lists. It traverses the
> list from left to write, applying f() on each element and the
> currently
> accumulated result.
>
> There is a dual function (foldr), which behaves in a similar way, but
> traverses the list from right to left:
>
>  foldr f z []      = z
>  foldr f z (x:xs)  = f x (foldr f z xs)
>
>
> As can be easily seen:
>
>  foldl (+) 0 xs
>
> is the sum of all elements in a list xs. Therefore we could write:
>
>  sum xs = foldl (+) 0 xs
>
> Analogously:
>
>  product xs = foldl (*) 1 xs
>
> And this one liner is the solution to one of the problems in Part I,
> section 1.
>
> We can ommit the last operand(s) from an equation, in case it is the
> same and we still get a valid function definition. Therefore,
> the above
> function definitions could be simplified even further and written as:
>
>  sum   = foldl (+) 0
>
>  product  = foldl (*) 1
>
>
> Reversing the elements of a list (solution of another problem in Part
> I, section 1.) is simply defined as:
>
>  reverse    = foldl (flip (:)) []
>
>
> Concatenating all elements of a list (solution of the next problem) is
> simply:
>
>  concat            = foldr (++) []
>
> where (++) is the concatenation operator for lists.
>
>
> Combining two lists with equal length into one can be performed using
> the zipWith() function:
>
>  zipWith f (a:as) (b:bs)   = f a b : zipWith f as bs
>  zipWith _ _      _        = []
>
> The function f() is applied on every pair of elements at position N
> from the two lists, and the result forms the element at position N in
> the result list.
>
>  ZipWith() solves directly all the problems from Part I, section 3:
>
>    - (a1 + b1, a2 + b2, ..., aN + bN) is just:
>
>       zipWith (+) as bs
>
>    - (a1 * b1, a2 * b2, ..., aN * bN) is just:
>
>       zipWith (*) as bs
>
>    - (a1 and b1, a2 and b2, ..., aN and bN) is just:
>
>       zipWith  and as bs
>
>    - (a1 or b1, a2 or b2, ..., aN or bN) is just:
>
>       zipWith or as bs
>
>
>
> A very useful function is scanl:
>
>  scanl f q xs   = q : (case xs of
> 		         []   -> []
> 			 x:xs -> scanl f (f q x) xs)
>
> It is similar to foldl, but creates a list, every element of which
> contains the intermediate accumulated result. The first element of the
> result-list is q.
>
> In case the list is guaranteed to be non-empty, then the following
> function can be defined:
>
>  scanl1 f (x:xs)   = scanl f x xs
>
> It behaves like scanl(), but doesnt use a zero argument.
>
> As can be easily seen,
>
>  scanl1 `op` xs
>
> produces a list of the intermediate accumulated results of performing
> op() on the list xs. For example:
>
>  scanl1 (+) [1, 2, 3] = [1, 3, 6] = [1, 1+2, 3 + 3]
>
>
> scanl1() combined with zipWith() solve directly the problem of
> calculating the running total from Part I, section 1:
>
>  scanl1 (+) (zipWith (*) [1,2,3] [2,2,2])
>
> returns:
>         [2, 6, 12]
>
> A direct solution to the filtering problems defined in Part I, section
> 2, is provided by the function filter():
>
>  filter p xs       = [ x | x <- xs, p x ]
>
> it takes a function p() defined on the type of elements of its second
> argument - a list xs, and returns a list of those elements of xs, for
> which p(x) = true.
>
> Using it, we can write:
>
>  - Return all items in a sequence, for which f(item) has minimal value
>
>     minvals  f  xs     =   filter (= fmin)  ys
>
>                               where fmin =  minimum  ys,
>                                     ys   =  map f  xs
>
>
>  - For some function f() test whether all the values of f(item) on a
> sequence  are equal (> 0, etc.)
>
>    allFPositive  f  xs  =  foldl   and   ys
>
>                              where  ys    =  map  f  xs
>
>  - For some function f() test whether all values f(item) on a sequence
> are in increasing order.
>
>   allFIncreasing  f  xs   =  foldl   and   ys
>
>              where   ys   =  zipWith  (<)  (init  xs)  (tail  xs)
>
>
> In the last solution we used the init() function, which is the dual of
> tail() - from a list xs it produces another, containing all
> elements of
> the first list , but the last one:
>
>  init (x:xs)       = x : init xs
>
>
>
> Finally, heres an example how to keep a language small and simple:
>
> Whenever a programmer needs a mapping operator, she could produce it
> immediately herself, without having to ask a working group for
> including it in the standard language, as follows:
>
> 1. She defines the function map():
>
>  map  f   xs  =  foldl  ( (:) . f  )   [ ]  xs
>
> 2. Because she needs to apply the mapping operator repeatedly, for
> convenience she defines a multiMap() function:
>
>  multiMap   xs   fs    =    foldr   map   xs   fs
>
>  multiMap   [1, 2, 3]   [(1+), (2*), (5+)]
>
> produces:
>
>  [13,15,17]   ,  that is  [(1 + 5)*2 + 1, (2 + 5)*2 + 1, (3 +
> 5)*2 + 1]
>
> 3. The multiMap() function as defined above applies the functions
> starting from the last one in the list. The programmer wants them
> applied starting from the first function in the list. She re-defines
> multiMap  by changing foldr to foldl as follows:
>
>   multiMap   xs   fs    =    foldl   (flip  map)   xs   fs
>
>
> Now
>     multiMap   [1, 2, 3]   [(1+),  (2*),  (5+)]
>
> produces:
>
>     [9, 11, 13]
>
>  that is:  [(1 + 1)*2 + 5, (2 + 1)*2 + 5, (3 + 1)*2 + 5]
>
> 4. She is ready to use the new function. For example, she specifies a
> series of SVG coordinate transformations:
>
>   multiMap  ($coordinates,
>                     ((. *2),
>                      (if (position() mod 2) then . + 50 else .)
>                       ..
>                     )
>              )
>
>
>
> III. Conclusion
> ---------------
>
>  XPath 2.0 as specified in the current working draft has the
> problems described in Part I. A language with support for higher-order
> function is free of these problems.
>
> IV. Recommendation
> ------------------
>
>  Based on the above conclusion, I recommend that higher-order
> functions support be implemented in Xpath 2.0.
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE video emails in Yahoo! Mail!
> http://promo.yahoo.com/videomail/
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread