Re: [xsl] RDF graph to SVG force-directed layout

Hi Michael,

I wanted to pick up this thread as I started working on this
stylesheet again. I've implemented some of the optimizations suggested
in this thread.
https://github.com/AtomGraph/Web-Client/blob/master/src/main/webapp/static/co
m/atomgraph/client/xsl/converters/RDFXML2SVG.xsl

I want to apply the (node, x, y, list-of-adjacent-nodes) tuple
strategy you suggested. But what would be an efficient data structure
for it in practice? Something like the following?

<xsl:for-each select="svg:g">
    <xsl:variable name="tuple-set" as="map(xs:string, item())">
        <xsl:map>
            <xsl:map-entry key="'node'" select="."/>
            <xsl:map-entry key="'x'" select="svg:circle/@cx"/>
            <xsl:map-entry key="'y'" select="svg:circle/@cy"/>
            <xsl:map-entry key="'adjacent-nodes'"
select="/svg:svg/svg:g[svg:g[@class = 'property']/@resource =
current()/@about]/svg:g[@class = ('subject', 'object')]"/>
            <xsl:map-entry key="'non-adjacent-nodes'" select="..."/>
        </xsl:map>
    </xsl:variable>
</xsl:for-each>


Another thing I wasn't sure how to fix was your remark:

<xsl:param name="force-nodes" select="key('nodes', ('circle',
'rect'))" as="element()*"/>

where the key is

<xsl:key name="nodes" match="svg:g[svg:circle] | svg:g[svg:rect]"
use="concat(svg:circle/local-name(), svg:rect/local-name())"/>

This is going to create a key with a very small number of entries,
each indexing a large number of nodes, and that doesn't feel like an
efficient thing to do.

On Thu, Oct 15, 2020 at 11:51 AM Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> OK, I'm starting to see where the adjacency logic is, around lines 335-350.
>
> It's computing some kind of "attraction force" for every pair of nodes in
the graph, where the force is negative if the two nodes aren't cconnected.
>
> The connectivity of nodes is unchanged for each iteration, but is being
recomputed on each iteration. If the data that you process on each iteration
is a set of tuples comprising
>
> (node, x, y, list-of-adjacent-nodes)
>
> then you can precompute the list-of-adjacent-nodes (and possibly even
list-of-non-adjacent-nodes) at the start, and only modify x and y each time
round the loop.
>
> It would be nice to find a way of reducing the number of distant
non-adjacent nodes you have to consider as the algorithm converges. Perhaps
there is some threshold where if a non-adjacent node is more than a certain
distance away, you can drop it from the list-of-non-adjacent-nodes that you
have to consider next time around, so that over time, you're only looking at
your position relative to nodes that are nearby?
>
> Michael Kay
> Saxonica
>
> On 15 Oct 2020, at 10:17, Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Having said all this, however, making the inner loop of your algorithm 5
times faster is only going to get you from 5 minutes to 1 minute, and to do
better than that (as Liam points out) you need to improve the algorithm.
>
> Doing a fixed number of iterations (50) seems rather crude; is there some
way you could detect convergence and stop when the results are good enough?
>
> Might there be some way of doing initial placement that's better than just
random positioning? For example, perhaps it's the case that in many datasets,
the order of the input nodes reflects some kind of proximity metric, and
therefore computing an initial position based on position in the input
sequence would give faster convergence.
>
> I haven't really worked out what the criteria are for computing node
proximity. I can't see any logic that decides which nodes you want to be
near.
>
> Michael Kay
> Saxonica
>
> On 15 Oct 2020, at 09:42, Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> This deserves closer study than I have time to give it.
>
> From a quick look, my attention is drawn to the template rule on line 312.
>
> Firstly, the variable $net is a sequence of elements that are essentially
(x, y) pairs, and my instinct is that using sequence of maps would cut a lot
of node-creation cost (and a bit of number-to-string and string-to-number
conversion). In fact, since you are computing the (x, y) pairs only in order
to then compute the sum of the x's and the sum of the y's, it seems
unneccessary to capture these values in a list in the first place; why not
compute the two totals as you go by using xsl:iterate in place of xsl:for-each
at line 325?
>
> Line 313 is
>
> <xsl:param name="force-nodes" select="key('nodes', ('circle', 'rect'))"
as="element()*"/>
>
> where the key is
>
> <xsl:key name="nodes" match="svg:g[svg:circle] | svg:g[svg:rect]"
use="concat(svg:circle/local-name(), svg:rect/local-name())"/>
>
> This is going to create a key with a very small number of entries, each
indexing a large number of nodes, and that doesn't feel like an efficient
thing to do.
>
> At lines 339 and 343, should the following-sibling calls be limited (using
[1]) to the immediatelty following sibling?
>
> At line 343, should the generate-id() comparison be replaced by "is"? (The
XJ compiler does this optimisation for you, XX doesn't).
>
> Finally, line 360-366 is a classic case of a subtree copy that's making a
very small change to an existing tree; I've written several papers that
attempt to address the problem that this is very inefficient because it
involves copying the whole tree.
>
> You might consider, instead of making incremental modifications to the
attributes of the svg:g elements, maintaining for each of these elements a map
containing the original svg:elements and the latest computed values of the
attributes that are being modified. Changing a single property in a map is
much more efficient that changing a single attribute of an element.
>
> This would also address another issue: on each iteration where a node is
repositioned, you're generating an SVG @transform attribute to reflect the new
position, and then on the next iteration, you are parsing this attribute to
compute the current position. Much better, surely, to maintain the actual
coordinates.
>
> Michael Kay
> Saxonica
>
>
> On 14 Oct 2020, at 22:00, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> could anyone suggest any optimizations to this stylesheet that
> transforms a graph encoded as RDF/XML to an SVG directed graph layout:
>
https://github.com/AtomGraph/Web-Client/blob/develop/src/main/webapp/static/c
om/atomgraph/client/xsl/converters/RDFXML2SVG.xsl
>
> Output example: https://twitter.com/namedgraph/status/1316476355874304001
>
> The problem is that it's quite slow: <100 nodes and 5 steps take a few
> minutes running on Saxon-JS 2 in Firefox or Chrome.
>
> It's based on a paper on force directed layout in XSLT: "GraphML
> Transformation":
>
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.182.3680&rep=rep1&ty
pe=pdf#page=58
>
> The algorithm:
> 1. position resource nodes (optionally also literals) randomly.
> (TO-DO: position on an ellipse?)
> 2. move nodes in a loop using the force-directed algorithm
> 3. draw lines between the nodes, calculating the correct intersection
> with the node border
>
> Note: only "flat" RDF/XML (properties grouped into descriptions; no
> nesting) is supported. It's called RDFXML_PLAIN in Jena.
>
> If anyone would like a sample file, I can easily provide :)
>
>
> Martynas
> atomgraph.com
>
>
>
> XSL-List info and archive
> EasyUnsubscribe (by email)
>
>
> XSL-List info and archive
> EasyUnsubscribe (by email)
<- Previous	Index	Next ->
Re: [xsl] Find/replace algorithm, Dimitre Novatchev dn	Thread	Re: [xsl] RDF graph to SVG force-di, Martynas Jusevičius
Re: [xsl] Find/replace algorithm, Albright, Jim jim_al	Date	Re: [xsl] Find/replace algorithm, Joel Kalvesmaki dire
	Month
<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home