Re: [xsl] Re: . in for

Subject: Re: [xsl] Re: . in for
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Sun, 6 Jan 2002 07:34:04 -0800 (PST)
--- Jeni Tennison <jeni@xxxxxxxxxxxxxxxx> wrote:
> Hi Dimitre,
> 
> > 1. 'mapping operator' must mean "a map function, which is used in an
> > infix notation".
> >
> > So instead of writing:
> >
> >  map fun list
> >
> > we now write:
> >
> >  list `map` fun      (let's ignore that this must be the other way around).
> >
> > However, the above defined mapping operator stil operates with
> > ***any function*** defined on lists.
> >
> > At the same time the mapping operator as you define it will operate
> > only on valid XPath expressions. This is quite limited compared to
> > the full mapping operator, therefore it would be best if the name of
> > the 'limited mapping operator' properly reflected this difference in
> > scope.
> 
> Absolutely. I suggested a simple mapping operator as a replacement for
> the more complex for expression that is currently defined in XPath
> 2.0. I fully recognise the fact that a simple mapping operator gains
> in simplicity, but loses in completeness - it cannot achieve
> everything that the for expression can achieve (it can't do joins).
> I'm sure that you're right that it also cannot achieve everything that
> a full mapping operator would achieve (perhaps you can give some
> examples?).

Anything that cannot be achieved by a single XPath expression, but can be produced
by a template. I hope even in XPath 2.0 / XSLT 2.0 this set is still non-empty?

> 
> (As I don't have your expertise, I cannot tell whether the for
> expression as currently defined can do everything a full mapping
> operator can do - perhaps you could comment on that?)
> 
> My thinking in suggesting a simple mapping operator was that it would
> meet over 80% of the practical requirements for a mapping operator
> that is used in XPath; the remainder can be handled in XSLT.
> 
> > 2. The example you provide of "piping" is actually a functional
> > composition of two map functions. It can be re-written like this:
> >
> >
> >    (map (if (position() mod 2) then . + 50 else .)) ¡Ñ  (map (. * 2))
> $coordinates 
> >
> > Here I use the special symbol &#8857; for the functional composition operator.
> >
> > So, the piping can be expressed as:
> >
> >      (map f1)
> >   ¡Ñ (map f2)
> >  ............               
> >   ¡Ñ (map fN)   $sequence
> >
> > It can easily be proven that the above is equal to:
> >
> > map (f1 ¡Ñ f2 ...  ¡Ñ fN) $sequence
> >
> > While your mapping operator will perform a series of mappings, each
> > producing an intermediate sequence and may require too much memory,
> > the last function applies the map function only once. The
> > composition of all functions is applied on every element of
> > $sequence and the resulting sequence is produced. No additional
> > memory for intermediate sequences is necessary.
> 
> That's a *very* interesting point :) As I understand it, you're saying
> that it would be more efficient if the two-step mapping was carried
> out in one step, and showing that you can compose the two steps:
> 
>  $coordinates -> (. * 2)
>               -> if (position() mod 2) then . + 50 else .
> 
> into one:
> 
>  $coordinates -> if (position() mod 2) then (. * 2) + 50 else (. * 2)
> 

Something like that, but not exactly in this way, rather:

Map (       (if (position() mod 2) then . + 50 else .)
      <<.>> (. * 2),
     $coordinates 
     )

(note the new representation for the composition operator)

> Presumably the same observation applies to the equivalent for
> expression?
> 
>   for $c in (for $d in $coordinates return $d * 2)
>   return if (position() mod 2) then $c + 50 else $c
> 
> This raises two questions/observations for me.
> 
> The first question arises because of the fact that in XPath 2.0
> sequences cannot contain sequences. This means that the result of a
> mapping operation does not necessarily have the same cardinality as
> the source of the mapping operation. As an example, say that the
> coordinates that we're dealing with are actually x,y,z coordinates,
> but we're only interested in the x,y coordinates. We could get rid of
> the z coordinates in one step, as follows:
> 
>  $coordinates -> if (position() mod 3) then . else ()
>               -> (. * 2)
>               -> if (position() mod 2) then . + 50 else .
> 
> Which I think in your syntax is:
> 
>   (map (if (position() mod 2) then . + 50 else .)) ¡Ñ
>   (map (. * 2)) ¡Ñ
>   (map (if (position() mod 3) then . else ())) $coordinates
> 

No, the above still performs a series of three map operations. What I proposed is to
perform ***just one*** map operation:

map(       (if (position() mod 2) then . + 50 else .)
     <<.>> (. * 2)
     <<.>> (if (position() mod 3) then . else ()), 
     $coordinates
    )

so all the expressions that are compose-d will be applied in reverse order to every
item of $coordinates.

> Using the same algorithm as before to compose these steps, I think the
> expression is:
> 
>   if (position() mod 2)
>   then ((if (position() mod 3) then . else ()) * 2) + 50
>   else ((if position() mod 3) then . else ()) * 2)
> 
> But this doesn't work - it adds 50 to the even coordinates from the
> complete x,y,z coordinate list and operates on them. So if you had a
> line defined as:
> 
>   (0, 0, 0, 50, 50, 50)
> 
> then you would get (empty sequences inserted to demonstrate the
> mapping):
> 
>   (50, 0, (), 100, 150, ())  =>  (50, 0, 100, 150)
> 
> whereas the expected result from the individual steps was:
> 
>   (50, 0, 150, 100)
> 
> Now I might have gone wrong in understanding how functions are
> composed, which is why I ask the question - can you do function
> composition in XPath 2.0, given that the cardinality of the sequence
> can change when a map operation is used and that the position of an
> item in the sequence should be accessible within the function?
> 

  Function composition can be performed, but if a single map() is performed then
position() will only return the original position of each item of the sequence.

In case you have designed your algorithm so that positions must be evaluated on
intermediate sequences, this already means that ***your algorithm*** requires series
of map()s to be performed.

But you could write:

(if (position() mod 3 = 2) then . + 50 else .)

instead of

(if (position() mod 2) then . + 50 else .)

and then the single-map algorithm will produce the correct result.

> 
> My second question is, whether the above issue is manageable or not,
> whether implementations could be clever enough to spot where the use
> of simple mapping operators is equivalent to functional composition,
> in order to optimise these expressions? Perhaps an implementation
> could recognise that given:
> 
>   $coordinates -> (. * 2)
>                -> if (position() mod 2) then . + 50 else .
> 
> it could compose the two steps on the right hand side of the first ->
> into a single operation, and use that to give added efficiency?
> 

In the general case if an XSLT processor optimizes your previous example, replacing
it by a single map with the composition of the expressions, it will yield the wrong
result as your detailed example above shows.

Anyway, if one designs his algorithm having the single mapping in mind, such an
error will not occur -- it only happens when trying to mechanically convert a
composition of map-s into a single map, and only if the mapping functions depend on
the cardinality of their inputs.

A more correct answer is that a function like position() must not be allowed as an
argument of a map function/operator. By definition, such a function must have as its
***single*** input only an item of the sequence. position() goes beyond the single
item and actually uses the whole sequence as its input.

> I also wonder if this can apply to steps in path expressions as well,
> and whether any XSLT processors currently take advantage of that. You
> could imagine that as well as keeping a record of the nodes reachable
> along a particular axis for each node, they might also keep
> information about the composition of those axes - when they see:
> 
>  row[following-sibling::row/@cust = @cust]
> 
> they might optimise by providing a quick access to each row's
> following sibling rows' 'cust' attributes when they initially
> construct the in-memory representation of the node tree. Mike, if
> you're listening, does Saxon do that? Have you considered it?
> 
> > This shows that it is better to have a map() function and a
> > composition operator for expressions (in case XPath 2.0 will not
> > fully support higher-order functions).
> 
> I think that you also need to have functions/expressions as objects in
> their own right to get a map() function to work as a function, don't
> you? 

Yes, but you can even now in XSLT 1.0 use map() implemented with generic templates
:o)

> As I see it, without functions/expressions as objects, we're
> stuck with using operators or expressions and unfortunately can't
> express the map() function in functional syntax.
> 

And then people will continue to use generic templates - based functional style.

> > An operator for functional composition has an added benefit that it
> > can be used with ***any functions***, not only with functions that
> > operate on sequences. Thus, it can be used to express an arbitrary
> > sequence of processing (piping) applied on a value of any datatype.
> 
> I agree - the ability to apply a series of operations to values,
> expressing those operations as individual steps rather than a single
> complex operation is really very powerful.
> 
> For example, start with an element, get its string value (default to
> 'FOO' if it's empty), normalize it, change it to upper case, and check
> whether the result equals "FOO". As a composed function this would be:
> 
>   upper-case(normalize-space(if-empty(., 'FOO'))) = 'FOO'
> 
> Expressed as a series of steps this would be:
> 
>   if-empty(., 'FOO') -> normalize-space(.)
>                      -> upper-case(.)
>                      -> (. = 'FOO')
> 
> Fortunately, with the simple mapping operator this works, because in
> XPath 2.0 individual values are treated exactly the same as
> single-item sequences.

Where did you read it? I read in 2.1.2 (Type conversions) the following:

"1. If the required type is anything other than a simple type or a single node, an
error is raised."

> 
> For interest, the equivalent for expression is as follows:
> 
>   for $a in (for $b in (for $c in if-empty(., 'FOO')
>                         return normalize-space($c))
>              return upper-case($b))
>   return ($a = 'FOO')
> 
> 
> I *think* that if you view the series of operations on the right hand
> side of the simple mapping operator as being functionally composed,
> then you have the functional composition operator that you're looking
> for?

If it is allowed to perform on a single item a function defined on a sequence, then
yes -- because the mapping operator as defined by you is just a functional
composition of map-s and a map will (if the assumption that this is allowed is
right) operate on a simple value or a node exactly like its argument-function.

Cheers,
Dimitre Novatchev.




__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread