Re: [xsl] faster complicated counting

Subject: Re: [xsl] faster complicated counting
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Thu, 1 Mar 2012 10:02:13 +0100
Can't you run a three-level for-each so that you can compute all three
numbers in one go?
-W

2012/3/1 Emmanuel Bigui <medusis@xxxxxxxxx>
>
> One way is to compute the respective position in variables, and then
> look them up with keys, so that each position is only computed once.
>
> For example, for the global position, you can add to the root of the
> stylesheet:
>
> <xsl:key name="l" match="l" use="@id"/>
>
> <xsl:variable name="global">
>        <xsl:for-each select="//l">
>                <l pos="{position()}" id="{generate-id(.)}"/>
>                </xsl:for-each>
>        </xsl:variable>
>
> and then, in each l element, look up the value of wwp:num-global like
> this:
>
> <xsl:attribute name="wwp:num-global" select="key('l', generate-id(.),
> $global)/@pos"/>
>
> Regards,
> EB
>
> 2012/2/29 Syd Bauman <Syd_Bauman@xxxxxxxxx>:
> > I am working with a relatively small dataset (~ 1 MiB) which uses a
> > TEI encoding. In TEI, a line of verse is encoded with an <l> element
> > (of which I have just about 306,000), which are grouped into groups
> > (like poems or stanzas) using <lg> (for "line group").
> >
> > In the output of the particular process I am working on now, I'd like
> > to adorn each <l> element with three new attributes that indicate the
> > count of the current <l> element in various contexts:
> >  wwp:num-global   = with respect to the entire document
> >  wwp:num-local    = with respect to the current stanza or other
> >                     small unit of poetry
> >  wwp:num-regional = with respect to the current poem or other
> >                     large unit of poetry
> >
> > So, as a toy example, see tiny.in.xml and tiny.out.xml, below.
> >
> > I have worked out code that gets me the desired counts. My problem is
> > that all the tree-walking it does slows down my process by well over
> > an order of magnitude. I am betting there is a much better way to do
> > this, probably using keys or <xsl:number>, but have not been able to
> > wrap my mind around it.
> >
> > The English-like pseudo-code for @num-local is "the count in the
> > context of the closest ancestor <lg> that itself has > 4 metrical
> > lines".
> >
> > The English-like pseudo-code for @num-regional is "the count in the
> > context of the closest ancestor <lg> that has a @type that contains
> > "poem" or whose first descendant <l> has n='1'".
> >
> > Here's what I have (note that we are only counting those <l> elements
> > that have an @part of 'I' or do not have a @part attribute at all):
> >
> >  <xsl:attribute name="wwp:num-global">
> >    <xsl:number count="l[not(@part)]|l[@part='I']" level="any"/>
> >  </xsl:attribute>
> >  <xsl:attribute name="wwp:num-regional">
> >    <xsl:variable name="region"
> >     select="(ancestor::lg[contains( @type,'poem') ]|ancestor::lg[
> > descendant::l[ @n eq '1'] ])[last()]"/>
> >    <xsl:value-of
> >
> >
select="count((preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg
/generate-id()
> > = $region/generate-id() ] ) +1"/>
> >  </xsl:attribute>
> >  <xsl:attribute name="wwp:num-local">
> >    <xsl:variable name="region"
> >     select="ancestor::lg[count( descendant::l[not(@part) or @part='I'] )
> > > 4 ][1]"/>
> >    <xsl:value-of
> >
> >
select="count((preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg
/generate-id()
> > = $region/generate-id() ] ) +1"/>
> >  </xsl:attribute>
> >
> > Thoughts appreciated.
> >
> > Notes
> > -----
> > * Yes, I realize that the test above is for *any* descendant <l> with
> >  n='1', not the first. We simply don't have any that aren't the
> >  first, so I didn't worry about it.
> >
> > * It's pretty likely we'll change the definition of what is
> >  "regional" in the near future, but it probably won't affect the
> >  basic problem I'm having. I.e., I'm hoping that if someone shows me
> >  how to do this "regional" better, I'll be able to do any future
> >  version on my own. Cross your fingers :-)
> >
> >
> > toy input
> > --- -----
> > <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> > <TEI xmlns="http://www.tei-c.org/ns/1.0";
> >     xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0";>
> >  <teiHeader>
> >    <!-- blah, blah, blah -->
> >  </teiHeader>
> >  <text>
> >    <body>
> >      <lg type="superStructure">
> >        <lg type="poem.duck">
> >          <l>one</l>
> >          <l>two</l>
> >          <l>three</l>
> >          <l>four</l>
> >          <l>five</l>
> >          <l>six</l>
> >          <l>seven</l>
> >          <l>eight</l>
> >          <l>nine</l>
> >          <l>ten</l>
> >        </lg>
> >        <lg type="poem.duck">
> >          <l>one</l>
> >          <l>two</l>
> >          <l>three</l>
> >          <l>four</l>
> >          <lg type="tercet">
> >            <l>five</l>
> >            <l>six</l>
> >            <l>seven</l>
> >          </lg>
> >          <l>eight</l>
> >          <l>nine</l>
> >          <l>ten</l>
> >        </lg>
> >        <lg type="poem.duck">
> >          <lg type="stanza">
> >            <l>one</l>
> >            <l>two</l>
> >            <l>three</l>
> >            <l>four</l>
> >            <l>five</l>
> >            <l>six</l>
> >            <l>seven</l>
> >            <l>eight</l>
> >          </lg>
> >          <lg type="stanza">
> >            <l>nine</l>
> >            <l>ten</l>
> >            <l>eleven</l>
> >            <l>twelve</l>
> >            <l>thirteen</l>
> >            <l>fourteen</l>
> >            <l>fifteen</l>
> >            <l>sixteen</l>
> >          </lg>
> >          <lg type="stanza">
> >            <l>seventeen</l>
> >            <l>eighteen</l>
> >            <l>nineteen</l>
> >            <l>twenty</l>
> >            <l>twentyone</l>
> >            <l>twentytwo</l>
> >            <l>twentythree</l>
> >            <l>twentyfour</l>
> >          </lg>
> >        </lg>
> >      </lg>
> >    </body>
> >  </text>
> > </TEI>
> >
> > toy code
> > --- ----
> > <?xml version="1.0" encoding="UTF-8"?>
> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
> >  xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0";
> > xmlns="http://www.tei-c.org/ns/1.0";
> >  xpath-default-namespace="http://www.tei-c.org/ns/1.0"; version="2.0">
> >
> >  <xsl:template match="/">
> >    <xsl:text>&#x0A;</xsl:text>
> >    <xsl:apply-templates/>
> >  </xsl:template>
> >  <xsl:template match="@*|text()|processing-instruction()|comment()">
> >    <xsl:copy/>
> >  </xsl:template>
> >  <xsl:template match="*">
> >    <xsl:copy>
> >      <xsl:apply-templates select="@*|node()"/>
> >    </xsl:copy>
> >  </xsl:template>
> >
> >  <xsl:template match="l">
> >    <xsl:copy>
> >      <xsl:attribute name="wwp:num-global">
> >        <xsl:number count="l[not(@part)]|l[@part='I']" level="any"/>
> >      </xsl:attribute>
> >      <xsl:attribute name="wwp:num-regional">
> >        <xsl:variable name="region"
> >          select="(ancestor::lg[ contains( @type,'poem') ]|ancestor::lg[
> > descendant::l[ @n eq '1'] ])[last()]"/>
> >        <xsl:value-of
> >          select="count(
> >
(preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id()
> > = $region/generate-id() ] ) +1"
> >        />
> >      </xsl:attribute>
> >      <xsl:attribute name="wwp:num-local">
> >        <xsl:variable name="region"
> >          select="ancestor::lg[count( descendant::l[not(@part) or
> > @part='I'] ) > 4 ][1]"/>
> >        <xsl:value-of
> >          select="count(
> >
(preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id()
> > = $region/generate-id() ] ) +1"
> >        />
> >      </xsl:attribute>
> >      <xsl:apply-templates select="@*|node()"/>
> >    </xsl:copy>
> >  </xsl:template>
> >
> > </xsl:stylesheet>
> >
> > toy output
> > --- ------
> > <?xml version="1.0" encoding="UTF-8"?>
> > <TEI xmlns="http://www.tei-c.org/ns/1.0";
> > xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0";>
> >  <teiHeader>
> >    <!-- blah, blah, blah -->
> >  </teiHeader>
> >  <text>
> >    <body>
> >      <lg type="superStructure">
> >        <lg type="poem.duck">
> >          <l wwp:num-global="1" wwp:num-regional="1"
> > wwp:num-local="1">one</l>
> >          <l wwp:num-global="2" wwp:num-regional="2"
> > wwp:num-local="2">two</l>
> >          <l wwp:num-global="3" wwp:num-regional="3"
> > wwp:num-local="3">three</l>
> >          <l wwp:num-global="4" wwp:num-regional="4"
> > wwp:num-local="4">four</l>
> >          <l wwp:num-global="5" wwp:num-regional="5"
> > wwp:num-local="5">five</l>
> >          <l wwp:num-global="6" wwp:num-regional="6"
> > wwp:num-local="6">six</l>
> >          <l wwp:num-global="7" wwp:num-regional="7"
> > wwp:num-local="7">seven</l>
> >          <l wwp:num-global="8" wwp:num-regional="8"
> > wwp:num-local="8">eight</l>
> >          <l wwp:num-global="9" wwp:num-regional="9"
> > wwp:num-local="9">nine</l>
> >          <l wwp:num-global="10" wwp:num-regional="10"
> > wwp:num-local="10">ten</l>
> >        </lg>
> >        <lg type="poem.duck">
> >          <l wwp:num-global="11" wwp:num-regional="1"
> > wwp:num-local="1">one</l>
> >          <l wwp:num-global="12" wwp:num-regional="2"
> > wwp:num-local="2">two</l>
> >          <l wwp:num-global="13" wwp:num-regional="3"
> > wwp:num-local="3">three</l>
> >          <l wwp:num-global="14" wwp:num-regional="4"
> > wwp:num-local="4">four</l>
> >          <lg type="tercet">
> >            <l wwp:num-global="15" wwp:num-regional="5"
> > wwp:num-local="5">five</l>
> >            <l wwp:num-global="16" wwp:num-regional="6"
> > wwp:num-local="6">six</l>
> >            <l wwp:num-global="17" wwp:num-regional="7"
> > wwp:num-local="7">seven</l>
> >          </lg>
> >          <l wwp:num-global="18" wwp:num-regional="8"
> > wwp:num-local="8">eight</l>
> >          <l wwp:num-global="19" wwp:num-regional="9"
> > wwp:num-local="9">nine</l>
> >          <l wwp:num-global="20" wwp:num-regional="10"
> > wwp:num-local="10">ten</l>
> >        </lg>
> >        <lg type="poem.duck">
> >          <lg type="stanza">
> >            <l wwp:num-global="21" wwp:num-regional="1"
> > wwp:num-local="1">one</l>
> >            <l wwp:num-global="22" wwp:num-regional="2"
> > wwp:num-local="2">two</l>
> >            <l wwp:num-global="23" wwp:num-regional="3"
> > wwp:num-local="3">three</l>
> >            <l wwp:num-global="24" wwp:num-regional="4"
> > wwp:num-local="4">four</l>
> >            <l wwp:num-global="25" wwp:num-regional="5"
> > wwp:num-local="5">five</l>
> >            <l wwp:num-global="26" wwp:num-regional="6"
> > wwp:num-local="6">six</l>
> >            <l wwp:num-global="27" wwp:num-regional="7"
> > wwp:num-local="7">seven</l>
> >            <l wwp:num-global="28" wwp:num-regional="8"
> > wwp:num-local="8">eight</l>
> >          </lg>
> >          <lg type="stanza">
> >            <l wwp:num-global="29" wwp:num-regional="9"
> > wwp:num-local="1">nine</l>
> >            <l wwp:num-global="30" wwp:num-regional="10"
> > wwp:num-local="2">ten</l>
> >            <l wwp:num-global="31" wwp:num-regional="11"
> > wwp:num-local="3">eleven</l>
> >            <l wwp:num-global="32" wwp:num-regional="12"
> > wwp:num-local="4">twelve</l>
> >            <l wwp:num-global="33" wwp:num-regional="13"
> > wwp:num-local="5">thirteen</l>
> >            <l wwp:num-global="34" wwp:num-regional="14"
> > wwp:num-local="6">fourteen</l>
> >            <l wwp:num-global="35" wwp:num-regional="15"
> > wwp:num-local="7">fifteen</l>
> >            <l wwp:num-global="36" wwp:num-regional="16"
> > wwp:num-local="8">sixteen</l>
> >          </lg>
> >          <lg type="stanza">
> >            <l wwp:num-global="37" wwp:num-regional="17"
> > wwp:num-local="1">seventeen</l>
> >            <l wwp:num-global="38" wwp:num-regional="18"
> > wwp:num-local="2">eighteen</l>
> >            <l wwp:num-global="39" wwp:num-regional="19"
> > wwp:num-local="3">nineteen</l>
> >            <l wwp:num-global="40" wwp:num-regional="20"
> > wwp:num-local="4">twenty</l>
> >            <l wwp:num-global="41" wwp:num-regional="21"
> > wwp:num-local="5">twentyone</l>
> >            <l wwp:num-global="42" wwp:num-regional="22"
> > wwp:num-local="6">twentytwo</l>
> >            <l wwp:num-global="43" wwp:num-regional="23"
> > wwp:num-local="7">twentythree</l>
> >            <l wwp:num-global="44" wwp:num-regional="24"
> > wwp:num-local="8">twentyfour</l>
> >          </lg>
> >        </lg>
> >      </lg>
> >    </body>
> >  </text>
> > </TEI>

Current Thread