Subject: Re: [xsl] faster complicated counting From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx> Date: Thu, 1 Mar 2012 10:02:13 +0100 |
Can't you run a three-level for-each so that you can compute all three numbers in one go? -W 2012/3/1 Emmanuel Bigui <medusis@xxxxxxxxx> > > One way is to compute the respective position in variables, and then > look them up with keys, so that each position is only computed once. > > For example, for the global position, you can add to the root of the > stylesheet: > > <xsl:key name="l" match="l" use="@id"/> > > <xsl:variable name="global"> > <xsl:for-each select="//l"> > <l pos="{position()}" id="{generate-id(.)}"/> > </xsl:for-each> > </xsl:variable> > > and then, in each l element, look up the value of wwp:num-global like > this: > > <xsl:attribute name="wwp:num-global" select="key('l', generate-id(.), > $global)/@pos"/> > > Regards, > EB > > 2012/2/29 Syd Bauman <Syd_Bauman@xxxxxxxxx>: > > I am working with a relatively small dataset (~ 1 MiB) which uses a > > TEI encoding. In TEI, a line of verse is encoded with an <l> element > > (of which I have just about 306,000), which are grouped into groups > > (like poems or stanzas) using <lg> (for "line group"). > > > > In the output of the particular process I am working on now, I'd like > > to adorn each <l> element with three new attributes that indicate the > > count of the current <l> element in various contexts: > > wwp:num-global = with respect to the entire document > > wwp:num-local = with respect to the current stanza or other > > small unit of poetry > > wwp:num-regional = with respect to the current poem or other > > large unit of poetry > > > > So, as a toy example, see tiny.in.xml and tiny.out.xml, below. > > > > I have worked out code that gets me the desired counts. My problem is > > that all the tree-walking it does slows down my process by well over > > an order of magnitude. I am betting there is a much better way to do > > this, probably using keys or <xsl:number>, but have not been able to > > wrap my mind around it. > > > > The English-like pseudo-code for @num-local is "the count in the > > context of the closest ancestor <lg> that itself has > 4 metrical > > lines". > > > > The English-like pseudo-code for @num-regional is "the count in the > > context of the closest ancestor <lg> that has a @type that contains > > "poem" or whose first descendant <l> has n='1'". > > > > Here's what I have (note that we are only counting those <l> elements > > that have an @part of 'I' or do not have a @part attribute at all): > > > > <xsl:attribute name="wwp:num-global"> > > <xsl:number count="l[not(@part)]|l[@part='I']" level="any"/> > > </xsl:attribute> > > <xsl:attribute name="wwp:num-regional"> > > <xsl:variable name="region" > > select="(ancestor::lg[contains( @type,'poem') ]|ancestor::lg[ > > descendant::l[ @n eq '1'] ])[last()]"/> > > <xsl:value-of > > > > select="count((preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg /generate-id() > > = $region/generate-id() ] ) +1"/> > > </xsl:attribute> > > <xsl:attribute name="wwp:num-local"> > > <xsl:variable name="region" > > select="ancestor::lg[count( descendant::l[not(@part) or @part='I'] ) > > > 4 ][1]"/> > > <xsl:value-of > > > > select="count((preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg /generate-id() > > = $region/generate-id() ] ) +1"/> > > </xsl:attribute> > > > > Thoughts appreciated. > > > > Notes > > ----- > > * Yes, I realize that the test above is for *any* descendant <l> with > > n='1', not the first. We simply don't have any that aren't the > > first, so I didn't worry about it. > > > > * It's pretty likely we'll change the definition of what is > > "regional" in the near future, but it probably won't affect the > > basic problem I'm having. I.e., I'm hoping that if someone shows me > > how to do this "regional" better, I'll be able to do any future > > version on my own. Cross your fingers :-) > > > > > > toy input > > --- ----- > > <?xml version="1.0" encoding="UTF-8" standalone="no"?> > > <TEI xmlns="http://www.tei-c.org/ns/1.0" > > xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0"> > > <teiHeader> > > <!-- blah, blah, blah --> > > </teiHeader> > > <text> > > <body> > > <lg type="superStructure"> > > <lg type="poem.duck"> > > <l>one</l> > > <l>two</l> > > <l>three</l> > > <l>four</l> > > <l>five</l> > > <l>six</l> > > <l>seven</l> > > <l>eight</l> > > <l>nine</l> > > <l>ten</l> > > </lg> > > <lg type="poem.duck"> > > <l>one</l> > > <l>two</l> > > <l>three</l> > > <l>four</l> > > <lg type="tercet"> > > <l>five</l> > > <l>six</l> > > <l>seven</l> > > </lg> > > <l>eight</l> > > <l>nine</l> > > <l>ten</l> > > </lg> > > <lg type="poem.duck"> > > <lg type="stanza"> > > <l>one</l> > > <l>two</l> > > <l>three</l> > > <l>four</l> > > <l>five</l> > > <l>six</l> > > <l>seven</l> > > <l>eight</l> > > </lg> > > <lg type="stanza"> > > <l>nine</l> > > <l>ten</l> > > <l>eleven</l> > > <l>twelve</l> > > <l>thirteen</l> > > <l>fourteen</l> > > <l>fifteen</l> > > <l>sixteen</l> > > </lg> > > <lg type="stanza"> > > <l>seventeen</l> > > <l>eighteen</l> > > <l>nineteen</l> > > <l>twenty</l> > > <l>twentyone</l> > > <l>twentytwo</l> > > <l>twentythree</l> > > <l>twentyfour</l> > > </lg> > > </lg> > > </lg> > > </body> > > </text> > > </TEI> > > > > toy code > > --- ---- > > <?xml version="1.0" encoding="UTF-8"?> > > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > > xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0" > > xmlns="http://www.tei-c.org/ns/1.0" > > xpath-default-namespace="http://www.tei-c.org/ns/1.0" version="2.0"> > > > > <xsl:template match="/"> > > <xsl:text>
</xsl:text> > > <xsl:apply-templates/> > > </xsl:template> > > <xsl:template match="@*|text()|processing-instruction()|comment()"> > > <xsl:copy/> > > </xsl:template> > > <xsl:template match="*"> > > <xsl:copy> > > <xsl:apply-templates select="@*|node()"/> > > </xsl:copy> > > </xsl:template> > > > > <xsl:template match="l"> > > <xsl:copy> > > <xsl:attribute name="wwp:num-global"> > > <xsl:number count="l[not(@part)]|l[@part='I']" level="any"/> > > </xsl:attribute> > > <xsl:attribute name="wwp:num-regional"> > > <xsl:variable name="region" > > select="(ancestor::lg[ contains( @type,'poem') ]|ancestor::lg[ > > descendant::l[ @n eq '1'] ])[last()]"/> > > <xsl:value-of > > select="count( > > (preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id() > > = $region/generate-id() ] ) +1" > > /> > > </xsl:attribute> > > <xsl:attribute name="wwp:num-local"> > > <xsl:variable name="region" > > select="ancestor::lg[count( descendant::l[not(@part) or > > @part='I'] ) > 4 ][1]"/> > > <xsl:value-of > > select="count( > > (preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id() > > = $region/generate-id() ] ) +1" > > /> > > </xsl:attribute> > > <xsl:apply-templates select="@*|node()"/> > > </xsl:copy> > > </xsl:template> > > > > </xsl:stylesheet> > > > > toy output > > --- ------ > > <?xml version="1.0" encoding="UTF-8"?> > > <TEI xmlns="http://www.tei-c.org/ns/1.0" > > xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0"> > > <teiHeader> > > <!-- blah, blah, blah --> > > </teiHeader> > > <text> > > <body> > > <lg type="superStructure"> > > <lg type="poem.duck"> > > <l wwp:num-global="1" wwp:num-regional="1" > > wwp:num-local="1">one</l> > > <l wwp:num-global="2" wwp:num-regional="2" > > wwp:num-local="2">two</l> > > <l wwp:num-global="3" wwp:num-regional="3" > > wwp:num-local="3">three</l> > > <l wwp:num-global="4" wwp:num-regional="4" > > wwp:num-local="4">four</l> > > <l wwp:num-global="5" wwp:num-regional="5" > > wwp:num-local="5">five</l> > > <l wwp:num-global="6" wwp:num-regional="6" > > wwp:num-local="6">six</l> > > <l wwp:num-global="7" wwp:num-regional="7" > > wwp:num-local="7">seven</l> > > <l wwp:num-global="8" wwp:num-regional="8" > > wwp:num-local="8">eight</l> > > <l wwp:num-global="9" wwp:num-regional="9" > > wwp:num-local="9">nine</l> > > <l wwp:num-global="10" wwp:num-regional="10" > > wwp:num-local="10">ten</l> > > </lg> > > <lg type="poem.duck"> > > <l wwp:num-global="11" wwp:num-regional="1" > > wwp:num-local="1">one</l> > > <l wwp:num-global="12" wwp:num-regional="2" > > wwp:num-local="2">two</l> > > <l wwp:num-global="13" wwp:num-regional="3" > > wwp:num-local="3">three</l> > > <l wwp:num-global="14" wwp:num-regional="4" > > wwp:num-local="4">four</l> > > <lg type="tercet"> > > <l wwp:num-global="15" wwp:num-regional="5" > > wwp:num-local="5">five</l> > > <l wwp:num-global="16" wwp:num-regional="6" > > wwp:num-local="6">six</l> > > <l wwp:num-global="17" wwp:num-regional="7" > > wwp:num-local="7">seven</l> > > </lg> > > <l wwp:num-global="18" wwp:num-regional="8" > > wwp:num-local="8">eight</l> > > <l wwp:num-global="19" wwp:num-regional="9" > > wwp:num-local="9">nine</l> > > <l wwp:num-global="20" wwp:num-regional="10" > > wwp:num-local="10">ten</l> > > </lg> > > <lg type="poem.duck"> > > <lg type="stanza"> > > <l wwp:num-global="21" wwp:num-regional="1" > > wwp:num-local="1">one</l> > > <l wwp:num-global="22" wwp:num-regional="2" > > wwp:num-local="2">two</l> > > <l wwp:num-global="23" wwp:num-regional="3" > > wwp:num-local="3">three</l> > > <l wwp:num-global="24" wwp:num-regional="4" > > wwp:num-local="4">four</l> > > <l wwp:num-global="25" wwp:num-regional="5" > > wwp:num-local="5">five</l> > > <l wwp:num-global="26" wwp:num-regional="6" > > wwp:num-local="6">six</l> > > <l wwp:num-global="27" wwp:num-regional="7" > > wwp:num-local="7">seven</l> > > <l wwp:num-global="28" wwp:num-regional="8" > > wwp:num-local="8">eight</l> > > </lg> > > <lg type="stanza"> > > <l wwp:num-global="29" wwp:num-regional="9" > > wwp:num-local="1">nine</l> > > <l wwp:num-global="30" wwp:num-regional="10" > > wwp:num-local="2">ten</l> > > <l wwp:num-global="31" wwp:num-regional="11" > > wwp:num-local="3">eleven</l> > > <l wwp:num-global="32" wwp:num-regional="12" > > wwp:num-local="4">twelve</l> > > <l wwp:num-global="33" wwp:num-regional="13" > > wwp:num-local="5">thirteen</l> > > <l wwp:num-global="34" wwp:num-regional="14" > > wwp:num-local="6">fourteen</l> > > <l wwp:num-global="35" wwp:num-regional="15" > > wwp:num-local="7">fifteen</l> > > <l wwp:num-global="36" wwp:num-regional="16" > > wwp:num-local="8">sixteen</l> > > </lg> > > <lg type="stanza"> > > <l wwp:num-global="37" wwp:num-regional="17" > > wwp:num-local="1">seventeen</l> > > <l wwp:num-global="38" wwp:num-regional="18" > > wwp:num-local="2">eighteen</l> > > <l wwp:num-global="39" wwp:num-regional="19" > > wwp:num-local="3">nineteen</l> > > <l wwp:num-global="40" wwp:num-regional="20" > > wwp:num-local="4">twenty</l> > > <l wwp:num-global="41" wwp:num-regional="21" > > wwp:num-local="5">twentyone</l> > > <l wwp:num-global="42" wwp:num-regional="22" > > wwp:num-local="6">twentytwo</l> > > <l wwp:num-global="43" wwp:num-regional="23" > > wwp:num-local="7">twentythree</l> > > <l wwp:num-global="44" wwp:num-regional="24" > > wwp:num-local="8">twentyfour</l> > > </lg> > > </lg> > > </lg> > > </body> > > </text> > > </TEI>
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] faster complicated counti, Emmanuel Bégué | Thread | Re: [xsl] faster complicated counti, Syd Bauman |
Re: [xsl] faster complicated counti, Emmanuel Bégué | Date | Re: [xsl] faster complicated counti, Syd Bauman |
Month |