[xsl] Performance of link target search, and Normalising or collapsing a pathname value, best method?

Subject: [xsl] Performance of link target search, and Normalising or collapsing a pathname value, best method?
From: "Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 22 Sep 2023 12:09:29 -0000
Hi

 

I am working with sets of XML document files which include "include"
elements; the include elements are substituted by the content of the file
found at include/@srcfile, and inclusions may be nested many levels deep.

 

These documents contain link elements which point to elements in other
documents. For reasons (FrameMaker) all the links have to be verified
because the FrameMaker cross references often point to the wrong target
file.

 

For example a.xml might include a link/cross-reference which points to
d.xml#an_id, but it will have saved this as <link srcfile="b.xml#an_id">
because b.xml includes c.xml which includes d.xml. I need to correct the
srcfile so that it refers to the correct file.

 

I have a template which does this, and at the moment it looks like this:

 

<!-- local key for cross references -->

<xsl:key name="linkidkey" match="*[@id]" use="@id" />

 

<xsl:template name="verify-link">

 

        <xsl:param name="linkurl" />

 

        <xsl:variable name="match-file">

                <xsl:choose>

                        <xsl:when test="contains($linkurl,'#')">

                                <xsl:value-of
select="substring-before($linkurl,'#')" />

                        </xsl:when>

                        <xsl:otherwise>

                                <xsl:value-of select="$linkurl" />

                        </xsl:otherwise>

                </xsl:choose>

        </xsl:variable>

 

        <!-- NB1 -->

        <xsl:variable name="match-prefix">

                <xsl:value-of select="string-join(tokenize($match-file,
'/')[position() != last()], '/')" />

        </xsl:variable>

 

        <xsl:variable name="match-id" select="substring-after($linkurl,'#')"
/>

 

        <xsl:choose>

 

                <!-- FILE -->

                <xsl:when test="$match-id = ''">

                        <xsl:value-of select="$linkurl" />

                </xsl:when>

 

                <!-- ID -->

                <xsl:when test="$match-file = ''">

                        <xsl:value-of select="$linkurl" />

                </xsl:when>

 

                <!-- FILE#ID -->

                <!-- need to verify that ID is in FILE, and correct if it
isn't -->

                <xsl:otherwise>

                        <xsl:variable name="this">

                                <xsl:for-each
select="document($match-file,/)">

                                        <xsl:value-of
select="key('linkidkey',$match-id)" />

                                </xsl:for-each>

                        </xsl:variable>

 

                        <xsl:choose>

                                <xsl:when test="$this != ''">

                                        <xsl:value-of
select="concat($match-file,'#',$match-id)" />

                                </xsl:when>

 

                                <xsl:otherwise>

                                        <xsl:for-each
select="document($match-file,/)">

                                                <!-- NB2 -->

                                                <xsl:for-each
select="//include">

                                                        <xsl:call-template
name="verify-link">

                                                                <!-- NB3 -->

 
<xsl:with-param name="linkurl"
select="concat($match-prefix,'/',@srcfile,'#',$match-id)" />

                                                        </xsl:call-template>

                                                </xsl:for-each>

                                        </xsl:for-each>

                                </xsl:otherwise>

                        </xsl:choose>

                </xsl:otherwise>

        </xsl:choose>

</xsl:template>

 

 

It works, which I suppose is the most important thing. But I have two
questions.

 

Firstly, where the code has the comments NB1 and NB3, the path of the
"included" sourcefiles to be visited needs to retain whatever prefix was
carried through from the "including" sourcefile, which is what the
"match-prefix" variable is used for.

Where all the included files are in the same subdirectory as the parent
which includes them, this is not a problem (and that is the most common
scenario).

But if any file is not then this code produces verified/adjusted links where
the srcfile may be something like "../../A/../B/../C/ccc.xml#xyz"

 

Is there a simple way of normalising that path? If necessary I can probably
write my own but there may be a function which already does it that I don't
know about.

Secondly, where the code has the comment NB2, there must be a significant
performance penalty because the expansion of "include" elements into deeper
and deeper levels is always performed even if the sought id was found in the
first included subfile. Is there a more efficient way I could do this
search? (Maybe the performance is tolerable in this case, the users haven't
complained yet, but if there is a better technique I could learn that would
likely help me avoid inefficient code in other projects I'd be grateful)

 

Thanks very much

 

Cheers

Trevor

Current Thread