Re: [xsl] Divide and Conquer implementations of str-foldl and str-map (Was: Re: big recursive function)

Subject: Re: [xsl] Divide and Conquer implementations of str-foldl and str-map (Was: Re: big recursive function)
From: Tom Myers <tommy@xxxxxxxxxxxxxx>
Date: Fri, 30 Nov 2001 15:05:46 -0500
I'm way behind, as always, but still interested....
On 28 Nov 2001 08:23:13 -0800 (PST), Dimitre Novatchev <dnovatchev@xxxxxxxxx> answered

> > > I haven't seen yet different way to process characters inside a node 
> > > text (except making an extension function but it's cheating :-). It looks 
> > > like the right way to do things from an XSLT point of view. 
> > 
> > using a divide and conquer method (see posts of dimitre's) should change 
> > the recursion depth from being O(n) to O(log n) (at the cost of giving 
> > up tail recursion) so on a system that doesn't implement tail recursion 
> > elimination this should be better (and may well be faster anyway).

>Here's what I think of as a generic way to solve the majority of problems with deep 
>recursion when processing long lists of characters (strings). ... ....
>Another, more understandable definition of map is:
>map :: (a -> b) -> [a] -> [b] 
>map _ [] = [] 
>map f (x:xs) = (f(x) : map f xs)
>For example,
>map (+1) [1,2,3] = [2,3,4]

I kinda like array-notation variations like
   map f S [i] = f(S[i])
myself, but I admit it's not a big deal, except that it makes clear that the length of
the result is always and necessarily the same as the length of the original, which is
relevant to this case. At least, I think so.

>Now, to solve the problem of indenting every new line, we simply map every character 
>to itself, except for the NL character, which we map to NL + "' '" (new line plus 
>4 spaces):

Notice when you do that, that there is an implicit accumulation of results: we take
a string S of length N, and we produce N strings (each of length 1 or length 5) and
then we produce a string of length >= N as a result of cumulative concatenation. Okay,
in template constructions everything gets cumulatively concatenated, and all algebraic
needs are satisfied for parallelism (concat is associative, identity "") so it's still
not a big deal, but when we define map :: (a -> b) -> [a] -> [b], it might be 
important; here your indentNL template is char -> string, i.e. char -> [char], 
so the map is (char -> [char]) -> [char] -> [[char]], unless I'm abusing the
notation again,  and in the end we do need to get back from [[char]] to [char], in
effect with a fold1 of concat which is folded into the XSLT implementation. Yes?

>... .... ...
>Of course, a more efficient solution will be a special DVC algorithm, using the XSLT 
>functions contains(), substring-before() and substring-after().
>The advantages of foldl and map is that they've been already implemented and can be 
>readily used to solve multitude of problems, without requiring difficult and 
>error-prone additional programming.

I'm wondering about two possibilities. One would be versions of fold1 and map that
would be used in pre-processing, in effect as macros, so that the XSLT programmer
could take something like your suggested
>     <xsl:template name="indentNL">
>       <xsl:param name="arg1"/>
>       <xsl:value-of select="$arg1"/>
>       <xsl:if test="$arg1='&#10;'">
>         <xsl:value-of select="'    '"/>
>       </xsl:if>
>     </xsl:template>

as one very small file, apply a "str-dcmap.xsl" transformation to it, and get a special
DVC algorithm out of it which could then be manually improved. In effect, you'd have a
headstart on the "difficult and error-prone additional programming", but you wouldn't
be asking map or fold1 to be more efficient. It should be easy to run within emacs or vi
or even pfe on windows, in effect building a piece of a program development environment.
But then again, maybe it's like generating assembler output from C code and editting that,
which I haven't done in years and never really learned to love. Thoughts?

Another possibility is that of introducing hierarchies automatically; it's not hard
to do DVC templates manually on data that's already hierarchical. I'm thinking of a
<xsl:template name="splitStr">
   <xsl:param name="data">
   <xsl:param name="break">
such that given the multiline string of this thread for "data" and "'&#10;'" for
"break", we'd get a tree of depth log(N) where N would be the number of break-chars.
Something like that, and something similar for nodesets, might increase the comfort
level of programmers who don't feel happy with functions as values but might be 
happy to use a template (or extension function, or even a 
usage). XSLT is good at being data-driven; if we can give it the right data, DVC
might be a whole lot easier. But as usual, I dunno.

Tom Myers

 XSL-List info and archive:

Current Thread