Re: [xsl] Word Ladders as an example of a "Find shortest path between two nodes in a graph" problem

Subject: Re: [xsl] Word Ladders as an example of a "Find shortest path between two nodes in a graph" problem
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Fri, 7 Dec 2012 04:56:55 -0800
Hi Herman,

Yes, I know about this facts, but they are specific to the Word Ladders case.

I am looking at the general case for a graph, where we cannot rely on
such limitations.

Also, as Dr. Kay has made clear, Saxons sequences in many cases behave
as arrays, so even if there is some difference between the two
languages implementations, it isn't that much.

And Wolfgang Laun showed his timings, which are pretty close to those for C++.

All this taken into account and if one finds XSLT as the more
convenient languge compared to C++, and if a split millisecond doesn't
matter, then I don't see anything wrong in writing such
implementations in a higher language such as XSLT and not in C++.

Cheers,

Dimitre

On Fri, Dec 7, 2012 at 1:09 AM, Hermann Stamm-Wilbrandt
<STAMMW@xxxxxxxxxx> wrote:
> Hello Dimitre,
>
> yes, it depends on the constant factors.
>
> In computer science the alphabet is A={a,b,...,z}.
> The language of all words of length k (not real words) is of size
> V_max = 26^k
>
> But the maximum node degree(v) is k*25.
>
> V_max * k*25 is much less than (V_max ^ 2)
>
> For your samples of length 5 and 6 the avarage degree is below 6!
> (see other email reply to Wolfgang).
>
> $ xpath++ "count(/*/w[string-length()=5])" dictEnglishBig.xml
> 10228
> $ xpath++ "count(/*/w[string-length()=6])" dictEnglishBig.xml
> 17705
> $
>
>
> Mit besten Gruessen / Best wishes,
>
> Hermann Stamm-Wilbrandt
> Level 3 support for XML Compiler team and Fixpack team lead
> WebSphere DataPower SOA Appliances
> https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
> https://twitter.com/#!/HermannSW/
> ----------------------------------------------------------------------
> IBM Deutschland Research & Development GmbH
> Vorsitzende des Aufsichtsrats: Martina Koederitz
> Geschaeftsfuehrung: Dirk Wittkopp
> Sitz der Gesellschaft: Boeblingen
> Registergericht: Amtsgericht Stuttgart, HRB 243294
>
>
> |------------>
> | From:      |
> |------------>
>   >------------------------------------------------------------------------------------------------------------------------------------------|
>   |Dimitre Novatchev <dnovatchev@xxxxxxxxx>                                                                                                  |
>   >------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | To:        |
> |------------>
>   >------------------------------------------------------------------------------------------------------------------------------------------|
>   |xsl-list@xxxxxxxxxxxxxxxxxxxxxx,                                                                                                          |
>   >------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | Date:      |
> |------------>
>   >------------------------------------------------------------------------------------------------------------------------------------------|
>   |12/06/2012 09:57 PM                                                                                                                       |
>   >------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | Subject:   |
> |------------>
>   >------------------------------------------------------------------------------------------------------------------------------------------|
>   |Re: [xsl] Word Ladders as an example of a "Find shortest path between two nodes in a graph" problem                                       |
>   >------------------------------------------------------------------------------------------------------------------------------------------|
>
>
>
>
>
> On Thu, Dec 6, 2012 at 10:29 AM, Hermann Stamm-Wilbrandt
> <STAMMW@xxxxxxxxxx> wrote:
>> Hi Dimitre,
>>
>> you are right with O(|E|), but not with O(N^2).
>> Implementation of "5.c" was a bit sloppy, initEdges() is indeed O(N^2).
>>
>> But if you have N real language words of length k, each single word can
>> have at most  k*26 = O(1) neighbors.
>>
>> Testing whether a word "c_1 c_2 ... c_k" is among the "words" can be done
>> in
>> constant time, too, if your language allows for constant time array
> access
>> (just use an array of size 26^k = O(1)).
>
> Sure, if you would allocate 8GB for this array (for k = 7).
>
> Also, calculating the index in the array requires k steps (and
> probably multiplication and addition at each such step).
>
>>
>> Now with each node having constant number of neighbors O(|E|) = O(|V|) =
> O
>> (N).
>>
>
> I was speaking about graph traversal in general. One may argue that
> the big O notation isn't too useful when the value of N is limited and
> not too-big.
>
> For example 26*N in this particular case is  bigger than log2(N) * N
> (unless we have more than 3 million words). So, this "O(N)"
> implementation  may take more actual time than an "O(N*log(N)))
>
> To put it in other words, the constants do matter when N is limited.
>
>
> Cheers,
>
> Dimitre
>
>
>>
>> Mit besten Gruessen / Best wishes,
>>
>> Hermann Stamm-Wilbrandt
>> Level 3 support for XML Compiler team and Fixpack team lead
>> WebSphere DataPower SOA Appliances
>> https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
>> https://twitter.com/#!/HermannSW/
>> ----------------------------------------------------------------------
>> IBM Deutschland Research & Development GmbH
>> Vorsitzende des Aufsichtsrats: Martina Koederitz
>> Geschaeftsfuehrung: Dirk Wittkopp
>> Sitz der Gesellschaft: Boeblingen
>> Registergericht: Amtsgericht Stuttgart, HRB 243294
>>
>>
>> |------------>
>> | From:      |
>> |------------>
>>
>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>   |Dimitre Novatchev <dnovatchev@xxxxxxxxx>
> |
>>
>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>> |------------>
>> | To:        |
>> |------------>
>>
>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>   |xsl-list@xxxxxxxxxxxxxxxxxxxxxx,
> |
>>
>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>> |------------>
>> | Date:      |
>> |------------>
>>
>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>   |12/06/2012 07:11 PM
> |
>>
>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>> |------------>
>> | Subject:   |
>> |------------>
>>
>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>   |Re: [xsl] Word Ladders as an example of a "Find shortest path between
> two nodes in a graph" problem                                       |
>>
>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>
>>
>>
>>
>> Hi Herman,
>>
>> I think that in a near-worst case the time-complexity is O(N^2),
>> because every arc needs to be inspected and in a well-connected graph
>> with N vertices there can be N*(N-1) vertices.
>>
>> This is also confirmed by Wikipedia and the 2-3 books on algoritms I
> have:
>>
>> "The time complexity can be expressed as O(|E|) since every vertex and
>> every edge will be explored in the worst case. Note: O(|E|) may vary
>> between O(|V|) and O(|V|^2), depending on how sparse the input graph
>> is (assuming that the graph is connected)".
>>
>>
>> To summarize, this is a property of the graph and not of the
>> programming language that is used.
>>
>>
>> Cheers,
>> Dimitre
>>
>>
>>
>> On Thu, Dec 6, 2012 at 10:01 AM, Hermann Stamm-Wilbrandt
>> <STAMMW@xxxxxxxxxx> wrote:
>>> Hi Dimitre,
>>>
>>>> That I have initially used a not probably the most efficient algorithm
>>>> / implementation, shouldn't be used to make general conclusions about
>>>> the appropriateness of using XSLT in solving a particular class of
>>>> problems.
>>>>
>>> I agree with you -- and your solution is nice.
>>>
>>> But breadth-first-search algorithm can be implemented as linear time
>>> algorithm in C or C++ -- I doubt that you can do linear time
>>> implementation in XSLT since constant time array access is missing ...
>>>
>>>
>>> Mit besten Gruessen / Best wishes,
>>>
>>> Hermann Stamm-Wilbrandt
>>> Level 3 support for XML Compiler team and Fixpack team lead
>>> WebSphere DataPower SOA Appliances
>>> https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
>>> https://twitter.com/#!/HermannSW/
>>> ----------------------------------------------------------------------
>>> IBM Deutschland Research & Development GmbH
>>> Vorsitzende des Aufsichtsrats: Martina Koederitz
>>> Geschaeftsfuehrung: Dirk Wittkopp
>>> Sitz der Gesellschaft: Boeblingen
>>> Registergericht: Amtsgericht Stuttgart, HRB 243294
>>>
>>>
>>> |------------>
>>> | From:      |
>>> |------------>
>>>
>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>   |Dimitre Novatchev <dnovatchev@xxxxxxxxx>
>> |
>>>
>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>> |------------>
>>> | To:        |
>>> |------------>
>>>
>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>   |xsl-list@xxxxxxxxxxxxxxxxxxxxxx,
>> |
>>>
>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>> |------------>
>>> | Date:      |
>>> |------------>
>>>
>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>   |12/06/2012 05:50 PM
>> |
>>>
>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>> |------------>
>>> | Subject:   |
>>> |------------>
>>>
>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>   |Re: [xsl] Word Ladders as an example of a "Find shortest path between
>> two nodes in a graph" problem                                       |
>>>
>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>
>>>
>>>
>>>
>>>
>>> Herman,
>>>
>>> That I have initially used a not probably the most efficient algorithm
>>> / implementation, shouldn't be used to make general conclusions about
>>> the appropriateness of using XSLT in solving a particular class of
>>> problems.
>>>
>>> Cheers,
>>>
>>> Dimitre
>>>
>>> On Thu, Dec 6, 2012 at 8:15 AM, Hermann Stamm-Wilbrandt
>>> <STAMMW@xxxxxxxxxx> wrote:
>>>>> ... I think that this
>>>>> isn't something that should be solved in XSLT at all, except as an
>>>>> academic exercise. ...
>>>>>
>>>> Agreed, nice XSLT solution, but not fast.
>>>>
>>>> This simple C program does find the longest path (35) to angry in a
>>>> second (on a W520 Thinkpad) based on Dimitie's word list of length 5:
>>>> http://www.stamm-wilbrandt.de/en/xsl-list/5.c
>>>>
>>>> $ time ./5 angry
>>>> yasht
>>>> yacht
>>>> pacht
>>>> pecht
>>>> wecht
>>>> wicht
>>>> wight
>>>> dight
>>>> digit
>>>> dimit
>>>> demit
>>>> remit
>>>> refit
>>>> befit
>>>> besit
>>>> beset
>>>> besee
>>>> belee
>>>> belve
>>>> beeve
>>>> breve
>>>> brave
>>>> brace
>>>> braca
>>>> araca
>>>> arara
>>>> amara
>>>> amala
>>>> alala
>>>> alula
>>>> aluta
>>>> abuta
>>>> abura
>>>> anura
>>>> anury
>>>> angry
>>>> 35
>>>>
>>>> real    0m1.046s
>>>> user    0m1.039s
>>>> sys     0m0.004s
>>>> $
>>>>
>>>>
>>>> Mit besten Gruessen / Best wishes,
>>>>
>>>> Hermann Stamm-Wilbrandt
>>>> Level 3 support for XML Compiler team and Fixpack team lead
>>>> WebSphere DataPower SOA Appliances
>>>> https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
>>>> https://twitter.com/#!/HermannSW/
>>>> ----------------------------------------------------------------------
>>>> IBM Deutschland Research & Development GmbH
>>>> Vorsitzende des Aufsichtsrats: Martina Koederitz
>>>> Geschaeftsfuehrung: Dirk Wittkopp
>>>> Sitz der Gesellschaft: Boeblingen
>>>> Registergericht: Amtsgericht Stuttgart, HRB 243294
>>>>
>>>>
>>>> |------------>
>>>> | From:      |
>>>> |------------>
>>>>
>>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>
>>>>   |Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
>>> |
>>>>
>>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>
>>>> |------------>
>>>> | To:        |
>>>> |------------>
>>>>
>>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>
>>>>   |xsl-list@xxxxxxxxxxxxxxxxxxxxxx,
>>> |
>>>>
>>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>
>>>> |------------>
>>>> | Date:      |
>>>> |------------>
>>>>
>>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>
>>>>   |11/28/2012 07:15 PM
>>> |
>>>>
>>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>
>>>> |------------>
>>>> | Subject:   |
>>>> |------------>
>>>>
>>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>
>>>>   |Re: [xsl] Word Ladders as an example of a "Find shortest path
> between
>>> two nodes in a graph" problem                                       |
>>>>
>>>>------------------------------------------------------------------------------------------------------------------------------------------|
>
>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The fact that one XSLT program runs three times faster on one XSLT
>>>> implementation
>>>> than on another one is strange, *very* strange. But is Saxon 6.4 the
>>>> "dernier cri"?
>>>> I'd very much like to hear Michael Kay's opinion on this.
>>>>
>>>> With Saxon HE 9.2.0 running with the -t option, I compare execution
>>> times:
>>>>    1209 ms with 40065592 bytes for WL's solution
>>>> to
>>>>    2768 ms with 81184768 bytes for DN's solution.
>>>>
>>>> Note: DN's solution being the one *without* the optimizations!
>>>>
>>>> Not that this is conclusive. Algorithms like this one must be judged
>>>> by more than a single run:
>>>> they may behave well for small word lengths and small ladder sizes,
>>>> and scale badly, or
>>>> the other way round. (Dimitre and I aren't even using the same word
>>>> data, AFAIK.)
>>>>
>>>> As an aside, I'd like to say that neither DN's nor WL's solution is
>>>> something that should
>>>> be used if this problem (i.e., shortest path) should ever need a
>>>> solution. I think that this
>>>> isn't something that should be solved in XSLT at all, except as an
>>>> academic exercise.
>>>> (Feel free to disagree - I'll not reply to anything contradicting me.)
>>>>
>>>> Cheers
>>>>
>>>>>
>>>>> Ok, I was running it with Saxon 6.4
>>>>>
>>>>> Now, the times are:
>>>>>
>>>>> With Saxon:
>>>>>
>>>>> Wolfgang's transformation: 25sec.
>>>>>
>>>>> Dimitre's :                            39sec.
>>>>>
>>>>>
>>>>> However, with XQSharp:
>>>>>
>>>>> Wolfgang's transformation: 23sec.
>>>>>
>>>>> Dimitre's :                            14sec.
>>>>>
>>>>>
>>>>> Therefore, one can't say wich transformation is faster -- it depends
>>>>> on the XSLT processor being used.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Dimitre
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 28, 2012 at 7:27 AM, Dimitre Novatchev
>>> <dnovatchev@xxxxxxxxx>
>>>>> wrote:
>>>>> > I get this error, trying to run your code:
>>>>> >
>>>>> > SAXON 6.5.4 from Michael Kay
>>>>> > Java version 1.6.0_31
>>>>> > Loading my:my
>>>>> > Preparation time: 250 milliseconds
>>>>> > Processing file:/C:\XSLT Projects\WordLadders\Ver 0.2\dictGraph4.xml
>>>>> > Building tree for file:/C:\XSLT Projects\WordLadders\Ver
>>>>> > 0.2\dictGraph4.xml using class com.icl.saxon.tinytree.TinyBuilder
>>>>> > Tree built in 351 milliseconds
>>>>> > Error at xsl:variable on line 23 of file:/(Untitled):
>>>>> >   Error in expression key('kFindWord', $pStartWord, $vDictGraph)
>>>>> >                     [count(../*)  lt  count(key('kFindWord',
>>>>> > $pTargetWord, $vDictGraph)/../* )]                        |
>>>>> >             key('kFindWord', $pTargetWord, $vDictGraph)
>>>>> >            [count(../*) le count(key('kFindWord',  $pStartWord,
>>>>> > $vDictGraph)/../*)]: expected "]", found "<name>"
>>>>> >
>>>>> >
>>>>> > Cheers,
>>>>> > Dimitre
>>>>> >
>>>>> > On Wed, Nov 28, 2012 at 5:40 AM, Wolfgang Laun
>>>> <wolfgang.laun@xxxxxxxxx>
>>>>> > wrote:
>>>>> >> <xsl:stylesheet version="2.0"
>>>>> >>    xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>>>>> >>    xmlns:my="my:my"
>>>>> >>    xmlns:xs="http://www.w3.org/2001/XMLSchema";
>>>>> >>    exclude-result-prefixes="my xs">
>>>>> >>
>>>>> >>   <xsl:output method="text"/>
>>>>> >>
>>>>> >>   <xsl:variable name="vDictGraph" select="/"/>
>>>>> >>   <xsl:key name="kFindWord" match="w" use="."/>
>>>>> >>
>>>>> >>   <xsl:param name="pStartWord"  select="'nice'" as="xs:string"/>
>>>>> >>   <xsl:param name="pTargetWord" select="'evil'" as="xs:string"/>
>>>>> >>
>>>>> >>   <xsl:variable name="vStartWord" as="xs:string"
>>>>> >>                 select="key('kFindWord', $pStartWord, $vDictGraph)
>>>>> >>                          [count(../*)  lt  count(key('kFindWord',
>>>>> >> $pTargetWord, $vDictGraph)/../* )]
>>>>> >>                       |
>>>>> >>                       key('kFindWord', $pTargetWord, $vDictGraph)
>>>>> >>                          [count(../*) le count(key('kFindWord',
>>>>> >> $pStartWord, $vDictGraph)/../*)]"/>
>>>>> >>
>>>>> >>   <xsl:variable name="vTargetWord" as="xs:string"
>>>>> >>                 select="($pStartWord, $pTargetWord)[not(. eq
>>>>> >> $vStartWord)]"/>
>>>>> >>
>>>>> >>   <!-- This function iterates over the temporary tree
>>>>> >>
>>>>
>>>
>>
> <result><arc level=".." from=".." to=".."/>...</result>
>>>>> >>     to find
>>>
>> the
>>>> ladder. It starts at a node matching @to with
>>>>> >>
> $vTargetWord
>>>>>
>>>>
>>> and
>>>> proceeds with decreasing @level. -->
>>>>> >>   <xsl:function name="my:find-path" as="xs:string*">
>>>>> >>     <xsl:param name="root"   as="node()"/>
>>>>> >>     <xsl:param name="level"  as="xs:integer"/>
>>>>> >>     <xsl:param name="start"  as="xs:string"/>
>>>>> >>     <xsl:param name="target" as="xs:string"/>
>>>>> >>     <xsl:param name="path"   as="xs:string"/>
>>>>> >>
>>>>> >>     <xsl:for-each select="$root/result/arc[@level = $level and @to
> =
>>>>> >> $target]">
>>>>> >>       <xsl:variable name="from" select="./@from"/>
>>>>> >>       <xsl:choose>
>>>>> >>         <xsl:when test="$start eq $from">
>>>>> >>           <xsl:value-of select="concat($from,'+',$path)"/>
>>>>> >>         </xsl:when>
>>>>> >>         <xsl:otherwise>
>>>>> >>           <xsl:value-of select="my:find-path($root,$level
>>>>> >> -1,$start,$from,concat($from,'+',$path))"/>
>>>>> >>         </xsl:otherwise>
>>>>> >>       </xsl:choose>
>>>>> >>     </xsl:for-each>
>>>>> >>   </xsl:function>
>>>>> >>
>>>>> >>   <xsl:template match="/">
>>>>> >>     <xsl:variable name='arcs'>
>>>>> >>       <result>
>>>>> >>       <xsl:call-template name="look-at-starts">
>>>>> >>         <xsl:with-param name="level"  select="1"/>
>>>>> >>         <xsl:with-param name="starts" select="$vStartWord"/>
>>>>> >>         <xsl:with-param name="target" select="$vTargetWord"/>
>>>>> >>         <xsl:with-param name="toskip" select="()"/>
>>>>> >>       </xsl:call-template>
>>>>> >>       </result>
>>>>> >>     </xsl:variable>
>>>>> >>
>>>>> >>     <xsl:variable name="finalArcs" select="$arcs/result/arc[@to =
>>>>> >> $vTargetWord]"/>
>>>>> >>     <xsl:value-of select="my:find-path($arcs, $finalArcs[1]/@level,
>>>>> >> $vStartWord, $vTargetWord, $vTargetWord)"/>
>>>>> >>   </xsl:template>
>>>>> >>
>>>>> >>   <!-- Look at $starters nodes obtained from the current set of
>>
> words
>>>>>
>>>>>>      ending all incomplete ladders. Generate result/arc for
>>
> each hop
>>> to
>>>>>
>>>>>>      the next step. Recurse if none of the arc
>>
> destinations is the
>>>>> >>
>>>> overall
>>>>> >>      target word, otherwise
>>
> return the last hop. -->
>>>>> >>   <xsl:template name="look-at-starts">
>>>>> >>     <xsl:param name="level"  as="xs:integer"/>
>>>>> >>     <xsl:param name="starts" as="xs:string*"/>
>>>>> >>     <xsl:param name="target" as="xs:string"/>
>>>>> >>     <xsl:param name="toskip" as="node()*"/>
>>>>> >>
>>>>> >>     <xsl:variable name="starters" as="node()*"
>>>>> >>                   select="key('kFindWord', $starts, $vDictGraph)/..
>>>>> >> except $toskip"/>
>>>>> >>
>>>>> >>     <xsl:for-each select="$starters">
>>>>> >>       <xsl:variable name="w" select="./w"/>
>>>>> >>       <xsl:for-each select="./nb">
>>>>> >>         <arc level="{$level}" from="{$w}" to="{.}"/>
>>>>> >>       </xsl:for-each>
>>>>> >>     </xsl:for-each>
>>>>> >>
>>>>> >>     <xsl:variable name="nbs" select="$starters/nb"/>
>>>>> >>
>>>>> >>     <xsl:choose>
>>>>> >>       <xsl:when test="$target = $nbs">
>>>>> >>         <!--xsl:message select="'found a ladder'"/-->
>>>>> >>       </xsl:when>
>>>>> >>       <xsl:otherwise>
>>>>> >>         <xsl:call-template name="look-at-starts">
>>>>> >>           <xsl:with-param name="level"  select="$level + 1"/>
>>>>> >>           <xsl:with-param name="starts"
>>>>> >> select="distinct-values($nbs)"/>
>>>>> >>           <xsl:with-param name="target" select="$target"/>
>>>>> >>           <xsl:with-param name="toskip" select="$toskip union
>>>>> >> $starters"/>
>>>>> >>         </xsl:call-template>
>>>>> >>       </xsl:otherwise>
>>>>> >>     </xsl:choose>
>>>>> >>   </xsl:template>
>>>>> >> </xsl:stylesheet>

Current Thread