Hi all,
TL;DR: An XSLT 2.0 stylesheet that restructures a flat SVRL document
into a hierarchical tree takes ~20 minutes for a 40 MB input (C"B B346,000
elements) on Saxon-PE 12.9. I suspect the following-sibling:: /
preceding-sibling:: lookups are the culprit. I would be grateful for
hints on how to rewrite the two hot templates (svrl:active-pattern,
svrl:fired-rule) while preserving the exact same output.
Background:
Schematron validation produces an SVRL (Schematron Validation Report
Language) document. To make the report accessible to domain experts, the
SVRL is post-processed in two steps:
(1) Transformation from the rather flat SVRL into a hierarchical XML
tree (the step in question).
(2) A domain-specific enrichment of that restructured tree.
From the resulting in-memory tree, two output chains are derived:
a) HTML, and
b) XSL-FO, rendered to PDF via FOP.
The intermediate result of step (1) is NOT serialised to disk; it is
held in an xsl:variable and consumed directly by step (2):
<xsl:import href="HierarchicalSVRL.xsl"/>
<xsl:variable name="strukt-svrl">
<xsl:apply-templates mode="restructure"/>
</xsl:variable>
Environment:
XSLT processor: Saxon-HE 12.9, no extensions
XSLT version: 2.0
Input size: ~40 MB SVRL, ~346,000 elements
Runtime of step (1): ~20 minutes
Profiling observation:
Running Saxon with -TP (profile) shows the dominant "total time
(net/ms)" for these two templates, which I would like to optimise:
* xsl:template element(Q{http://purl.oclc.org/dsdl/svrl}active-pattern)
* xsl:template element(Q{http://purl.oclc.org/dsdl/svrl}fired-rule)
The stylesheet (HierarchicalSVRL.xsl):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:hsvrl="http://tu-dresden.de/vlp/schematron/hierarchical-svrl"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs svrl"
version="2.0">
<xsl:output method="xml" indent="yes"/>
<!-- The following templates in 'mode="restructure"' perform the
restructuring process -->
<xsl:template match="@*|comment()" mode="restructure">
<xsl:copy copy-namespaces="no">
<xsl:apply-templates select="@*|comment()" mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:template match="svrl:*" mode="restructure">
<xsl:element name="{local-name()}"
namespace="http://tu-dresden.de/vlp/schematron/hierarchical-svrl">
<xsl:apply-templates select="@*|comment()|node()"
mode="#current"/>
</xsl:element>
</xsl:template>
<xsl:template match="*" mode="restructure">
<xsl:element name="{name()}" namespace="{namespace-uri()}">
<xsl:apply-templates select="@*|comment()|node()"
mode="#current"/>
</xsl:element>
</xsl:template>
<xsl:template match="svrl:schematron-output" mode="restructure">
<xsl:element name="{local-name()}"
namespace="http://tu-dresden.de/vlp/schematron/hierarchical-svrl">
<xsl:apply-templates select="@*" mode="#current"/>
<xsl:comment>
This is a restructured SVRL document, which does not
comply with ISO 19757-3 Annex D grammar!
</xsl:comment>
<xsl:apply-templates select="comment()" mode="#current"/>
<xsl:apply-templates select="svrl:text" mode="#current"/>
<xsl:apply-templates
select="svrl:ns-prefix-in-attribute-values" mode="#current"/>
<xsl:apply-templates select="svrl:active-pattern"
mode="#current"/>
</xsl:element>
</xsl:template>
<xsl:template match="svrl:active-pattern" mode="restructure">
<xsl:element name="{local-name()}"
namespace="http://tu-dresden.de/vlp/schematron/hierarchical-svrl">
<xsl:apply-templates select="@*|comment()" mode="#current"/>
<xsl:apply-templates select="*" mode="#current"/>
<xsl:apply-templates
select="following-sibling::svrl:fired-rule[count(preceding-sibling::svrl:active-pattern[1]
| current()) = 1]"
mode="#current"/>
</xsl:element>
</xsl:template>
<xsl:template match="svrl:fired-rule[@flag = 'ignore']"
mode="restructure">
<xsl:apply-templates mode="restructure"/>
</xsl:template>
<xsl:template
match="svrl:failed-assert[preceding-sibling::*[1]/@flag = 'ignore']"
mode="restructure" priority="2">
<xsl:apply-templates mode="restructure"/>
</xsl:template>
<xsl:template
match="svrl:successful-report[preceding-sibling::*[1]/@flag = 'ignore']"
mode="restructure" priority="2">
<xsl:apply-templates mode="restructure"/>
</xsl:template>
<xsl:template match="svrl:fired-rule[not(@role)]"
mode="restructure" priority="-1">
<xsl:apply-templates mode="restructure"/>
</xsl:template>
<xsl:template match="svrl:fired-rule" mode="restructure">
<xsl:element name="{local-name()}"
namespace="http://tu-dresden.de/vlp/schematron/hierarchical-svrl">
<xsl:apply-templates select="@*|comment()" mode="#current"/>
<xsl:variable name="next-element"
select="parent::*/following-sibling::*[1]"/>
<xsl:if test="$next-element/not(svrl:*)">
<xsl:apply-templates select="$next-element"
mode="#current"/>
</xsl:if>
<xsl:apply-templates
select="following-sibling::svrl:failed-assert[count(preceding-sibling::svrl:fired-rule[1]
| current()) = 1] |
following-sibling::svrl:successful-report[count(preceding-sibling::svrl:fired-rule[1]
| current()) = 1]"
mode="#current"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
"Minimal" Input Example (SVRL):
(I can post the full sample if anyone wants it; I trimmed it here for
brevity.)
<svrl:schematron-output
xmlns:fx="http://tu-dresden.de/vlp/schematron/functions"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:planpro="http://www.plan-pro.org/regeln/struktur"
xmlns:saxon="http://saxon.sf.net/"
xmlns:schold="http://www.ascc.net/xml/schematron"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
title="Regelbasis fCB<r PlanPro-PlaZ"
schemaVersion="ISO19757-3">
<svrl:ns-prefix-in-attribute-values
uri="http://www.plan-pro.org/regeln/struktur" prefix="planpro"/>
<svrl:ns-prefix-in-attribute-values
uri="http://tu-dresden.de/vlp/schematron/functions" prefix="fx"/>
<svrl:ns-prefix-in-attribute-values
uri="http://www.w3.org/2001/XMLSchema-instance" prefix="xsi"/>
<svrl:active-pattern
document="file:/C:/Users/xyz/PlaZ/PlanPro-samples/Testdateien/Bezeichnertest2.xml"
id="ID123"
name="test rule"
fpi="12345678-9ABC-DEF1-2345-6789ABCDEF12"
see="test"
planpro:workpackage="BASISOBJEKTE"
planpro:version="1.10.0.1">
<svrl:text>
<planpro:description xmlns="http://purl.oclc.org/dsdl/schematron">
Human readable (sometimes lengthy) description of the
specific rule, to be applied to the whole input XML file
</planpro:description>
<planpro:comment xmlns="http://purl.oclc.org/dsdl/schematron"/>
<planpro:test xmlns="http://purl.oclc.org/dsdl/schematron">
<planpro:success>human readable success
message</planpro:success>
<planpro:error>human readable error message</planpro:error>
</planpro:test>
<planpro:output
xmlns="http://purl.oclc.org/dsdl/schematron">PlanPro object
type</planpro:output>
</svrl:text>
</svrl:active-pattern>
<svrl:fired-rule
context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/>
<svrl:failed-assert test="false()"
location="/*:PlanPro_Schnittstelle[namespace-uri()='http://www.plan-pro.org/modell/PlanPro/1.10.0.1'][1]/LST_Planung[1]/Fachdaten[1]/Ausgabe_Fachdaten[1]/LST_Zustand_Ziel[1]/Container[1]/Anhang[1]">
<svrl:text>Es ist ein Fehler aufgetreten.</svrl:text>
<svrl:diagnostic-reference
diagnostic="guid">317691e7-6b55-428d-925b-9107f72b9bc0</svrl:diagnostic-reference>
<svrl:diagnostic-reference
diagnostic="typ">Anhang</svrl:diagnostic-reference>
<svrl:diagnostic-reference
diagnostic="bereich">Betrachtung</svrl:diagnostic-reference>
<svrl:diagnostic-reference
diagnostic="aufbau">00</svrl:diagnostic-reference>
<svrl:diagnostic-reference
diagnostic="s1">Anhang</svrl:diagnostic-reference>
<svrl:diagnostic-reference diagnostic="s2">file
name</svrl:diagnostic-reference>
<svrl:diagnostic-reference diagnostic="s3"/>
<svrl:diagnostic-reference diagnostic="s4"/>
<svrl:diagnostic-reference diagnostic="s5"/>
<svrl:diagnostic-reference diagnostic="s6"/>
<svrl:diagnostic-reference diagnostic="s7"/>
</svrl:failed-assert>
<svrl:fired-rule
context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/>
<svrl:fired-rule
context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/>
<svrl:fired-rule
context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/>
<svrl:fired-rule
context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/>
<svrl:fired-rule
context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/>
<svrl:fired-rule
context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/>
<svrl:fired-rule
context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/>
<svrl:fired-rule
context="(LST_Zustand|LST_Zustand_Ziel)/Container/*" role="error"/>
<svrl:failed-assert test="false()"
location="/*:PlanPro_Schnittstelle[namespace-uri()='http://www.plan-pro.org/modell/PlanPro/1.10.0.1'][1]/LST_Planung[1]/Fachdaten[1]/Ausgabe_Fachdaten[1]/LST_Zustand_Ziel[1]/Container[1]/Aussenelementansteuerung[1]">
<svrl:text>Es ist ein Fehler aufgetreten.</svrl:text>
<svrl:diagnostic-reference
diagnostic="guid">bc2efe9a-a70b-4249-9c84-80636c08b093</svrl:diagnostic-reference>
<svrl:diagnostic-reference
diagnostic="typ">Aussenelementansteuerung</svrl:diagnostic-reference>
<svrl:diagnostic-reference
diagnostic="bereich">Betrachtung</svrl:diagnostic-reference>
<svrl:diagnostic-reference
diagnostic="aufbau">01</svrl:diagnostic-reference>
<svrl:diagnostic-reference
diagnostic="s1">Au\xDFenelementansteuerung</svrl:diagnostic-reference>
<svrl:diagnostic-reference
diagnostic="s2">Gleisfreimelde-Innenanlage</svrl:diagnostic-reference>
<svrl:diagnostic-reference diagnostic="s3">AEA
blah</svrl:diagnostic-reference>
<svrl:diagnostic-reference diagnostic="s4"/>
<svrl:diagnostic-reference diagnostic="s5"/>
<svrl:diagnostic-reference diagnostic="s6"/>
<svrl:diagnostic-reference diagnostic="s7"/>
</svrl:failed-assert>
</svrl:schematron-output>
Desired Restructured Output (excerpt):
<?xml version="1.0" encoding="UTF-8"?>
<schematron-output
xmlns="http://tu-dresden.de/vlp/schematron/hierarchical-svrl"
title="Regelbasis f\xFCr PlanPro-PlaZ" schemaVersion="ISO19757-3"><!--
This is a restructured SVRL document, which does not
comply with ISO 19757-3 Annex D grammar!
-->
<ns-prefix-in-attribute-values
uri="http://www.plan-pro.org/regeln/struktur" prefix="planpro"/>
<ns-prefix-in-attribute-values
uri="http://tu-dresden.de/vlp/schematron/functions" prefix="fx"/>
<ns-prefix-in-attribute-values
uri="http://www.w3.org/2001/XMLSchema-instance" prefix="xsi"/>
<active-pattern xmlns:planpro="http://www.plan-pro.org/regeln/struktur"
document="file:/C:/Users/xyz/PlaZ/PlanPro-samples/Testdateien/Bezeichnertest2.xml"
id="ID123"
name="test rule"
fpi="12345678-9ABC-DEF1-2345-6789ABCDEF12"
see="test"
planpro:workpackage="BASISOBJEKTE"
planpro:version="1.10.0.1">
<text>
<planpro:description>
Human readable (sometimes lengthy) description of the
specific rule, to be applied to the whole input XML file
</planpro:description>
<planpro:comment/>
<planpro:test>
<planpro:success>human readable success
message</planpro:success>
<planpro:error>human readable error message</planpro:error>
</planpro:test>
<planpro:output>PlanPro object type</planpro:output>
</text>
<fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*"
role="error">
<failed-assert test="false()"
location="/*:PlanPro_Schnittstelle[namespace-uri()='http://www.plan-pro.org/modell/PlanPro/1.10.0.1'][1]/LST_Planung[1]/Fachdaten[1]/Ausgabe_Fachdaten[1]/LST_Zustand_Ziel[1]/Container[1]/Anhang[1]">
<text>Es ist ein Fehler aufgetreten.</text>
<diagnostic-reference
diagnostic="guid">317691e7-6b55-428d-925b-9107f72b9bc0</diagnostic-reference>
<diagnostic-reference
diagnostic="typ">Anhang</diagnostic-reference>
<diagnostic-reference
diagnostic="bereich">Betrachtung</diagnostic-reference>
<diagnostic-reference
diagnostic="aufbau">00</diagnostic-reference>
<diagnostic-reference
diagnostic="s1">Anhang</diagnostic-reference>
<diagnostic-reference diagnostic="s2">file
name</diagnostic-reference>
<diagnostic-reference diagnostic="s3"/>
<diagnostic-reference diagnostic="s4"/>
<diagnostic-reference diagnostic="s5"/>
<diagnostic-reference diagnostic="s6"/>
<diagnostic-reference diagnostic="s7"/>
</failed-assert>
</fired-rule>
<fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*"
role="error"/>
<fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*"
role="error"/>
<fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*"
role="error"/>
<fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*"
role="error"/>
<fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*"
role="error"/>
<fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*"
role="error"/>
<fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*"
role="error"/>
<fired-rule context="(LST_Zustand|LST_Zustand_Ziel)/Container/*"
role="error">
<failed-assert test="false()"
location="/*:PlanPro_Schnittstelle[namespace-uri()='http://www.plan-pro.org/modell/PlanPro/1.10.0.1'][1]/LST_Planung[1]/Fachdaten[1]/Ausgabe_Fachdaten[1]/LST_Zustand_Ziel[1]/Container[1]/Aussenelementansteuerung[1]">
<text>Es ist ein Fehler aufgetreten.</text>
<diagnostic-reference
diagnostic="guid">bc2efe9a-a70b-4249-9c84-80636c08b093</diagnostic-reference>
<diagnostic-reference
diagnostic="typ">Aussenelementansteuerung</diagnostic-reference>
<diagnostic-reference
diagnostic="bereich">Betrachtung</diagnostic-reference>
<diagnostic-reference
diagnostic="aufbau">01</diagnostic-reference>
<diagnostic-reference
diagnostic="s1">Au\xDFenelementansteuerung</diagnostic-reference>
<diagnostic-reference
diagnostic="s2">Gleisfreimelde-Innenanlage</diagnostic-reference>
<diagnostic-reference diagnostic="s3">AEA
blah</diagnostic-reference>
<diagnostic-reference diagnostic="s4"/>
<diagnostic-reference diagnostic="s5"/>
<diagnostic-reference diagnostic="s6"/>
<diagnostic-reference diagnostic="s7"/>
</failed-assert>
</fired-rule>
</active-pattern>
</schematron-output>
In short: every fired-rule has to swallow its following failed-assert /
successful-report siblings up to the next fired-rule, and every
active-pattern has to swallow its following fired-rule group up to the
next active-pattern.
Is there an XSLT 2.0 way that produces identical output but avoids the
apparent n^2 cost of the sibling-axis idioms above?
Any pointer, code sketch, or "you are doing this wrong because..." is
highly welcome.
Thanks in advance, and best regards,
Susanne