Re: [xsl] The entity was referenced, but not declared.

Subject: Re: [xsl] The entity was referenced, but not declared.
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 13 Jun 2023 06:35:56 -0000
On 6/13/2023 12:48 AM, Manuel Souto Pico terminolator@xxxxxxxxx wrote:


I'm trying to convert a collection of XLIFF files into TMX. The files
contain some HTML named entities, which makes my stylesheet choke:



My question is: Is there any way I can avoid or fix this problem from
the XSLT stylesheet without having to modify the input XLIFF files?

The example above is with ndash but I believe there must be many HTM
named entities in the files.


David Carlisle wrote an HTML tag soup parser in XSLT 2 (https://github.com/davidcarlisle/web-xslt/blob/main/htmlparse/htmlparse.xsl) that knows all the named entities and can also be used as an XML parser knowing those entities so if you use/import his stylesheet and use its function instead of normal XML parsing, as in


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; B version="3.0" B xmlns:xs="http://www.w3.org/2001/XMLSchema"; B xmlns:d="data:,dpc" B exclude-result-prefixes="#all" B expand-text="yes">

B  <xsl:import
href="https://raw.githubusercontent.com/davidcarlisle/web-xslt/main/htmlparse
/htmlparse.xsl"/>

B <xsl:param name="xml-uri" as="xs:string" select="'sample1.xml'"/>

B <xsl:mode on-no-match="shallow-copy"/>

B  <xsl:template name="xsl:initial-template">
B B B  <xsl:apply-templates select="unparsed-text($xml-uri) =>
d:htmlparse('', false())"/>
B  </xsl:template>

</xsl:stylesheet>

the named entity references should be parsed into the corresponding
characters (and you can process all nodes by adding any templates you
need/have/want to transform the XML). So the above assumes starting e.g.
Saxon 9.8 or later with `-it` for the initial template.

Current Thread