Re: [xsl] What is Micro Pipelining: an attempt for a definition (was: Calculating cumulative values - Abel's solution)

Subject: Re: [xsl] What is Micro Pipelining: an attempt for a definition (was: Calculating cumulative values - Abel's solution)
From: "Mukul Gandhi" <gandhi.mukul@xxxxxxxxx>
Date: Sat, 1 Sep 2007 13:57:41 +0530
I find this discussion very informative, and couldn't resisting to say
something.

There are essentially two aspects about the information being
processed by an instruction processor - throughput, and latency. The
dictionary defines these two terms as:
Throughput - the quantity or amount of raw material processed within a
given time, esp. the work done by an electronic computer in a given
period of time.
Latency - the period of inactivity between the time the stimulus is
presented and the moment a response occurs.

Let's say, an entire workload (W) is composed of 3 tasks (A, B & C).
Let's say, the instruction processor is presented to process 2-3 (or
in general multiple) such workloads (where each workload is composed
of some tasks - A, B & C in this case).

Pipelining is essentially providing the workloads to the processor in
an overlapped fashion (and not sequentially). What is the effect of
having a pipelined execution over a sequential execution? Pipelining
doesn't help the latency of a single task. But it helps throughput of
the entire workload.

There are other characteristics of pipelined processors
1) Pipeline rate is limited by slowest pipeline stage
2) Multiple tasks operating simultaneously using different resources
3) Potential Speedup = Number of pipe stages
4) Unbalanced lengths of pipe stages reduces speedup
5) Time to fill pipeline and time to drain it reduces speedup

The most well known example in computer hardware is, piplelined
instruction processors (which speedup instruction processing). Where
it takes less CPU cycles to process multiple instructions.

This is essentially the formula for time computation with using pipelines

Time (pipeline)  = Time (non-pipeline) / Pipe stages

(Assuming, that stages are perfectly balanced & there are ideal conditions)

But to realize the pipelinign benefits, there are certain costs:
1) Structural hazards
2) Control hazards
3) Data hazards

It's interesting to see [if we can ?] applying these concepts to XSLT.
I would like to present this example:

Let's say, we have 3 template rules as below.

<xsl:template match="rule1">
  <!-- processing -->
</xsl:template>

<xsl:template match="rule2">
  <!-- processing -->
</xsl:template>

<xsl:template match="rule3">
  <!-- processing -->
</xsl:template>

A given workload (by drawing analogy from the above descriptions) is
composed of 3 tasks (these 3 template rules). To execute the whole
workload, we call these 3 rules in sequence (like this):

<xsl:apply-templates select="rule1" />
<xsl:apply-templates select="rule2" />
<xsl:apply-templates select="rule3" />

Let's say, we need to execute 2-3 (or in general, multiple) instances
of the same workload (composed of the 3 tasks). Can we design a
processing mechanism in XSLT, which can take advantage of pipelined
processing.

Following is the depiction of non-pipelined vs. piplelined processing
(Wn is a workload, whereas, A, B, C etc. are the tasks in the
workload).

(W1)
A -> B -> C
              (W2)
              A -> B -> C
                            (W3)
                             A -> B-> C

(W1)
A -> B -> C
       (W2)
       A -> B -> C
              (W3)
              A -> B -> C

(It can be clearly seen, executing multiple workloads in pipelined
fashion takes less time)

I have a positive feeling, that this processing style *cannot* be
simulated with XSLT. Or, can it be? I think, we need to have a
multithreading or multiprocessing environment for this (for e.g., with
Java we can do this).

PS: Some of my ideas are borrowed from literature on internet.

On 9/1/07, Abel Braaksma <abel.online@xxxxxxxxx> wrote:
> Hi Wendell,
>
> Very valuable feedback. However, though I think I grasp your points,
> after reading I still find a lot of "micro-pipelining" to be in a gray
> area. Let me try to explain myself.
>
> For me, micro-pipelining is defined as the process of pipelining a set
> of data during a one pass through an XSLT processor. Normal pipelining
> would be to re-apply the results of one pass to the next pass. I
> understand that this doesn't tally with your definition below.
>
> Another attempt for a definition (still my own, original, definition)
> would be the following: in a case where you apply templates (or for-each
> perhaps) to a set of data that is not part of the input data (input data
> being any external source, including document() / collection() etc) is a
> micro pipeline. Shorter: when you have a variable to contain data that
> is already processed and you re-apply this variable, it is a micro pipeline.
>
> Now I do understand that this is a bit of a too wide definition. But now
> on to yours. You explain that for anything to be a micro pipeline it
> must be applied (or expected to be applied) more than once. More
> appropriately: when you can extract the pipeline variable into a global
> variable, it is not a micro pipeline anymore (is it then a macro pipeline?).
>
> But things are still not so trivial. It is tempting to say that the
> following is only executed once, because there's only one root in a
> document, and as a result, it is easy to extract the variable to the
> global level:
>
>   <xsl:template match="/">
>       <xsl:variable name="micro">
>            <xsl:apply-templates select="root/some/data" />
>       </xsl:variable>
>       <xsl:apply-templates select="$micro/*" />
>   </xsl:template>
>
> But often, simple examples, or inquiries on this list, are part of a
> larger stylesheet or solution. Suppose I alter the above example to be
> as follows:
>
>   <xsl:param name="urls" select=" 'one.xml', 'two.xml', 'three.xml' " />
>
>   <xsl:template name="main">
>       <xsl:apply-templates select="document($list-of-documents)" />
>   </xsl:template>
>
>   <xsl:template match="/">
>       <xsl:variable name="micro">
>            <xsl:apply-templates select="root/some/data" />
>       </xsl:variable>
>       <xsl:apply-templates select="$micro/*" />
>   </xsl:template>
>
> is it now still not a micro pipeline? It is applied several times (three
> times) and it is not possible anymore to make the variable global. Is it
> really necessary to restrict a micro pipeline to be one only when it is
> applied to a local level? Though I can follow your point in that it is
> closer to a "large pipeline" than to a "micro pipeline".
>
> Now let's try the other opposite. To define a term, you need to know
> where its boundaries are. Suppose we have the following fictional
> stylesheet, would the application of $micro be a micro pipeline?
>
>  <xsl:variable name="micro">
>      this must be tokenized
>  <xsl:variable>
>
>  <xsl:template name="main">
>     <xsl:apply-templates select="my:tokenize($micro)" />
>  </xsl:template>
>
>  <xsl:template match="my:token">
>      <xsl:copy-of select="." />
>  </xsl:template>
>
>  <xsl:function name="my:tokenize">
>     <xsl:param name="tokens">
>     <xsl:variable name="preproc-tokens">
>         <xsl:for-each select="tokenize($tokens, ' ')">
>             <my:token value="{.}" />
>         </xsl:for-each>
>     </xsl:variable>
>     <xsl:apply-templates select="$preproc-tokens/*" mode="my:tokenize" />
>  </xsl:function>
>
>  <xsl:template match="my:token" mode="my:tokenize">
>     <xsl:copy>
>        <xsl:sequence select="replace('text(), [^A-z]', '')" />
>     </xsl:copy>
>  </xsl:template>
>
>
> In this tokenize example we see two things. We see a global variable
> holding data. This is processed using my:tokenize() and then the results
> are re-applied. Even though we are talking about global data (holding a
> document node with text), I would consider both phases a micro pipeline:
> the apply-templates in the my:tokenize function starts a micro pipeline,
> and the apply-templates in the main entry template starts a micro
> pipeline. Or not?
>
> I don't know the answers. I have seen the term micro pipeline come up
> every now and then without a lot of explanation. I did a quick check on
> the internet a couple of times, but a clear definition seems hard to
> find. Even the XML Pipeline languages (still in Draft) at W3C do not
> mention the distinction. Wikipedia has a small but effective line on the
> subject though: "Some standards also categorize transformation as macro
> (changes impacting an entire file) or micro (impacting only an element
> or attribute)" (http://en.wikipedia.org/wiki/XML_pipeline).
>
> But this simple-enough definition does not help XSLT: we have root
> nodes, document nodes, files, non-xml data, result tree fragments,
> sequences.... When is it micro and when is it macro? I do think your
> definition comes quite close, but it has some rough edges. Would you
> (and others) give it a try? (the definition, I mean)?
>
> Cheers,
>
> -- Abel Braaksma
>
>
>
> Wendell Piez wrote:
> > Abel,
> >
> > Thanks for the very nicely composed explanation for Simon. One of the
> > best things about this list is how explanations are provided that can
> > be followed by people who don't themselves have any need for a
> > particular solution, but who can still learn valuable lessons and
> > techniques from the sidelines. It's one of the best ways of learning.
> >
> > I would, however, quarrel with one aspect of your description. You
> > have the code:
> >
> >     <xsl:template match="/">
> >         <!-- mini pipeline: put points into a variable and process
> > them -->
> >         <xsl:variable name="points">
> >             <xsl:apply-templates select="root/set[1]/point[1]"
> > mode="aggregate">
> >                 <xsl:with-param name="calc" tunnel="yes">
> >                     <for x="1" y2="2" y3="2"/>
> >                 </xsl:with-param>
> >             </xsl:apply-templates>
> >         </xsl:variable>
> >
> >         <!-- apply set with pre-processed points -->
> >         <xsl:apply-templates select="root/set">
> >             <xsl:with-param name="points" select="$points" />
> >         </xsl:apply-templates>
> >     </xsl:template>
> >
> > ... which you refer to as using a "micro-pipeline".
> >
> > As I explained earlier this week, I don't believe this is a
> > micro-pipeline, since it operates globally. The declaration of $points
> > could appear outside the template and it would perform the same way. A
> > micro-pipeline would be if you bound a variable in a template you
> > expected to fire more than once, and then processed the results of
> > that variable. Admittedly there may be a grey zone between a pipeline
> > operating at the document (global) level, and one that operates at a
> > more local level; but I believe this falls fairly clearly into the
> > first category.
> >
> > (Then too, as you also explain, you don't really run a pipeline here
> > at all, but use the results of pre-processing as a lookup table.)
> >
> > The reason I stress this is because I'm afraid that if we start
> > calling anything a micro-pipeline just because it involves some
> > matching and applying of templates whose results won't appear in the
> > output, then we'll have to invent a new word for what we actually
> > invented the term for.
> >
> > The more general technique, I'd say, is called "pipelining", or -- if
> > the results are themselves not processed directly (this includes
> > generating lookup tables) "pre-processing". Another interesting thing
> > to reflect on is that pipelining and pre-processing can be achieved in
> > XSLT 1.0 by passing the results of one stylesheet into another as its
> > source. This really isn't practical with micro-pipelining, which
> > happens only within the scope of a single branch of the tree at a time.
> >
> > I apologize if this comes across as rude. Another valuable thing we do
> > on this list is guard one another's terminology. ("Don't call them
> > tags", etc.) This keeps the language strong because we have
> > unambiguous terms to refer to things, methods and techniques, which
> > can then be discussed and learned about.
> >
> > Cheers,
> > Wendell


-- 
Regards,
Mukul Gandhi

Current Thread