Re: [xsl] Building a Trie

Subject: Re: [xsl] Building a Trie
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 22 Jun 2023 07:44:41 -0000
Saxon (-PE and higher) has extensions that might help.

First, on xsl:key you can declare saxon:range-key="yes" (which for some reason
is not mentioned at
https://www.saxonica.com/documentation12/index.html#!extensions/attributes but
only at
https://www.saxonica.com/documentation12/index.html#!functions/saxon/key-map ;
the documentation also fails to mention that it doesn't work on SaxonCS.)

You can then view any part of the key as a sorted map using saxon:key-map(),
and within that sorted map, keys() delivers the keys in sorted order.

It's a bit convoluted, but to find words starting with "PQR" you could define
a key-map with a range from "PQR" to "PQR&#x10FFFF;"; and then use keys() on
that map. (Note that key-map doesn't build a new index, it just creates a view
of a subset of the existing index).

Having got the data structure, I'm sure I could find a better way of
exploiting it....

Meanwhile I'm going to add an issue on the qt4 list to add sorted maps.

Michael Kay
Saxonica

> On 22 Jun 2023, at 07:29, Michael Mueller-Hillebrand
michael.mueller-hillebrand@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
>
> Dear community,
>
> Now I learned about Tries https://en.wikipedia.org/wiki/Trie
<https://en.wikipedia.org/wiki/Trie> which is possibly a helpful method to
tackle the left-over problem with my DemoJam contribution at Markup UK (thanks
a lot for the applause!).
>
> Without spending too much time at the details, but at the heart of the
performance problem was this function:
>
> <xsl:function name="my:wordMatches" as="xs:string*">
>   <xsl:param name="start" as="xs:string"/>
>   <xsl:sequence select="$words[starts-with(., $start)]"/>
> </xsl:function>
>
> With $words containing 25,000 entries, the performance was sufficient, but
with 160,000 entries it is just bad.
>
> I already improved the performance by building a map, where the keys are the
first two letters of each word, and the value are all matching words.
>
> <xsl:function name="my:wordMatches" as="xs:string*">
>   <xsl:param name="start" as="xs:string"/>
>   <xsl:variable name="key" select="substring($start, 1, 2)"
as="xs:string"/>
>   <xsl:variable name="words" select="map:get($wordsMap, $key)"
as="xs:string*"/>
>   <xsl:sequence select="$words[starts-with(., $start)]"/>
> </xsl:function>
>
> Building that map takes some noticeable time, but then performance is great.
But only because I reduced the length of $words by factor 200b&300.
>
> Before In look deeper myself, has anyone in the community already created a
Trie-like structure (map of maps?) or can share suggestions how to tackle this
in XSLT/XPath? (Because those are the hammers I plan to use for this
challenge/nail.)
>
> Thanks,
> - Michael
>
> Michael MC<ller-Hillebrand
> Senior Consultant
> Phone +49 951-20859-752
> Mobil +49 172-819 34 13
> michael.mueller-hillebrand@xxxxxxxxx
<mailto:michael.mueller-hillebrand@xxxxxxxxx>
> www.docufy.de <https://www.docufy.de/> | DOCUFY@LinkedIN
<https://www.linkedin.com/company/3845358/>
> Datenschutz <https://www.docufy.de/datenschutz/>
> DOCUFY GmbH | KirschC$ckerstr. 27 | 96052 Bamberg | Deutschland
> CEO: Nadine Prill | Amtsgericht Bamberg HRB10571
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email <>)

Current Thread