Re: [xsl] Wide Finder in XSLT --> deriving new requirements for efficiency in XSLT processors.

Subject: Re: [xsl] Wide Finder in XSLT --> deriving new requirements for efficiency in XSLT processors.
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Sat, 10 Nov 2007 17:57:29 -0800
> I think you're right that parallelizing probably needs some kind of user
> hint in the stylesheet, but my instinct would be to make it an extension
> attribute so that the code will still work on any processor. There are
> certainly lots of opportunities. I had been thinking that probably the first
> thing to try would be
>
> <xsl:for-each select=...." xx:threads="4">
>
> and allocate the processing of the items in the input sequence to the N
> threads in a round-robin fashion. the challenge being of course how to
> marshal the output of the N threads, stitching it back together as a
> sequence in the right order, without using a lot of extra memory and
> creating a lot of extra coordination overhead. The ideal would be that if
> the input sequence is streamed, the output sequence should be streamed too.

Fully agree.

So, let's propose a

          par:threads="N"

attribute for acceptance at exslt.org.

It seems to me that it may be useful to specify the required
parallelization in a relative way as the programmer cannot know what
would be the maximum and current number of available threads (as in a
thread pool)"

          par:threads="nn%"

I guess the details can be worked out.




-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play


On Nov 10, 2007 2:15 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>
> >
> > I have published this on my blog:
> >
> >
> > http://dnovatchev.spaces.live.com/Blog/cns!44B0A32C2CCF7488!385.entry
> >
> > There are two areas on which I would appreciate any feedback:
> >
> >    1. Finding a more efficient solution (there are such
> > RegExp gurus here!)
> >
> >    2. Discussing the ideas for lazy evaluation/streaming and
> > on constructs (a single extension function exslt:parMap() is
> > proposed) hinting possibilities for parallelization
>
> I managed to guess a username/password that worked(!) and made a comment,
> but it appears anonymously.
>
> I couldn't find a clear description of the problem - your link named
> "problem" seems to lead to a book.
>
> Most of the discussion suggests that the performance is going to be
> dominated by time taken to read the data off the disk. So I wouldn't have
> thought there is an enormous win for parallelization here. There's certainly
> more that can be done to reduce memory requirements by pipelining, though.
>
> I think you're right that parallelizing probably needs some kind of user
> hint in the stylesheet, but my instinct would be to make it an extension
> attribute so that the code will still work on any processor. There are
> certainly lots of opportunities. I had been thinking that probably the first
> thing to try would be
>
> <xsl:for-each select=...." xx:threads="4">
>
> and allocate the processing of the items in the input sequence to the N
> threads in a round-robin fashion. the challenge being of course how to
> marshal the output of the N threads, stitching it back together as a
> sequence in the right order, without using a lot of extra memory and
> creating a lot of extra coordination overhead. The ideal would be that if
> the input sequence is streamed, the output sequence should be streamed too.
>
> Michael Kay
> http://www.saxonica.com/

Current Thread