Re: [xsl] Building a Trie

Subject: Re: [xsl] Building a Trie
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 22 Jun 2023 08:12:33 -0000
Incidentally, in SaxonJ the default data structure for a map is in fact a
HashTrie, implemented using Michael Froh's ImmutableMap library. But a
HashTrie gives you sorted access by hash code, not by the actual key value. An
xsl:key index defined using saxon:range-key="yes" is implemented as a Java
TreeMap, which gives you sorted access using the key value. A TreeMap is
apparently implemented as a red-black tree, not as a Trie, but we don't really
need to know that.

Michael Kay
Saxonica

> On 22 Jun 2023, at 08:44, Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Saxon (-PE and higher) has extensions that might help.
>
> First, on xsl:key you can declare saxon:range-key="yes" (which for some
reason is not mentioned at
https://www.saxonica.com/documentation12/index.html#!extensions/attributes
<https://www.saxonica.com/documentation12/index.html#!extensions/attributes>
but only at
https://www.saxonica.com/documentation12/index.html#!functions/saxon/key-map
<https://www.saxonica.com/documentation12/index.html#!functions/saxon/key-map>
; the documentation also fails to mention that it doesn't work on SaxonCS.)
>
> You can then view any part of the key as a sorted map using saxon:key-map(),
and within that sorted map, keys() delivers the keys in sorted order.
>
> It's a bit convoluted, but to find words starting with "PQR" you could
define a key-map with a range from "PQR" to "PQR&#x10FFFF;"; and then use
keys() on that map. (Note that key-map doesn't build a new index, it just
creates a view of a subset of the existing index).
>
> Having got the data structure, I'm sure I could find a better way of
exploiting it....
>
> Meanwhile I'm going to add an issue on the qt4 list to add sorted maps.
>
> Michael Kay
> Saxonica
>
>> On 22 Jun 2023, at 07:29, Michael Mueller-Hillebrand
michael.mueller-hillebrand@xxxxxxxxx
<mailto:michael.mueller-hillebrand@xxxxxxxxx>
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
>>
>> Dear community,
>>
>> Now I learned about Tries https://en.wikipedia.org/wiki/Trie
<https://en.wikipedia.org/wiki/Trie> which is possibly a helpful method to
tackle the left-over problem with my DemoJam contribution at Markup UK (thanks
a lot for the applause!).
>>
>> Without spending too much time at the details, but at the heart of the
performance problem was this function:
>>
>> <xsl:function name="my:wordMatches" as="xs:string*">
>>   <xsl:param name="start" as="xs:string"/>
>>   <xsl:sequence select="$words[starts-with(., $start)]"/>
>> </xsl:function>
>>
>> With $words containing 25,000 entries, the performance was sufficient, but
with 160,000 entries it is just bad.
>>
>> I already improved the performance by building a map, where the keys are
the first two letters of each word, and the value are all matching words.
>>
>> <xsl:function name="my:wordMatches" as="xs:string*">
>>   <xsl:param name="start" as="xs:string"/>
>>   <xsl:variable name="key" select="substring($start, 1, 2)"
as="xs:string"/>
>>   <xsl:variable name="words" select="map:get($wordsMap, $key)"
as="xs:string*"/>
>>   <xsl:sequence select="$words[starts-with(., $start)]"/>
>> </xsl:function>
>>
>> Building that map takes some noticeable time, but then performance is
great. But only because I reduced the length of $words by factor 200b&300.
>>
>> Before In look deeper myself, has anyone in the community already created a
Trie-like structure (map of maps?) or can share suggestions how to tackle this
in XSLT/XPath? (Because those are the hammers I plan to use for this
challenge/nail.)
>>
>> Thanks,
>> - Michael
>>
>> Michael MC<ller-Hillebrand
>> Senior Consultant
>> Phone +49 951-20859-752
>> Mobil +49 172-819 34 13
>> michael.mueller-hillebrand@xxxxxxxxx
<mailto:michael.mueller-hillebrand@xxxxxxxxx>
>> www.docufy.de <https://www.docufy.de/> | DOCUFY@LinkedIN
<https://www.linkedin.com/company/3845358/>
>> Datenschutz <https://www.docufy.de/datenschutz/>
>> DOCUFY GmbH | KirschC$ckerstr. 27 | 96052 Bamberg | Deutschland
>> CEO: Nadine Prill | Amtsgericht Bamberg HRB10571
>>
>> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
>> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email <applewebdata://7ECE1E74-C159-42BA-9130-783318C4DB87>)
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/3500899> (by
email <>)

Current Thread