Subject: Re: [xsl] The entity was referenced, but not declared. From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Tue, 13 Jun 2023 06:35:56 -0000 |
I'm trying to convert a collection of XLIFF files into TMX. The files contain some HTML named entities, which makes my stylesheet choke:
My question is: Is there any way I can avoid or fix this problem from the XSLT stylesheet without having to modify the input XLIFF files?
The example above is with ndash but I believe there must be many HTM named entities in the files.
David Carlisle wrote an HTML tag soup parser in XSLT 2 (https://github.com/davidcarlisle/web-xslt/blob/main/htmlparse/htmlparse.xsl) that knows all the named entities and can also be used as an XML parser knowing those entities so if you use/import his stylesheet and use its function instead of normal XML parsing, as in
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" B version="3.0" B xmlns:xs="http://www.w3.org/2001/XMLSchema" B xmlns:d="data:,dpc" B exclude-result-prefixes="#all" B expand-text="yes">
B <xsl:import href="https://raw.githubusercontent.com/davidcarlisle/web-xslt/main/htmlparse /htmlparse.xsl"/>
B <xsl:template name="xsl:initial-template"> B B B <xsl:apply-templates select="unparsed-text($xml-uri) => d:htmlparse('', false())"/> B </xsl:template>
the named entity references should be parsed into the corresponding characters (and you can process all nodes by adding any templates you need/have/want to transform the XML). So the above assumes starting e.g. Saxon 9.8 or later with `-it` for the initial template.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] The entity was referenced, Michael Kay michaelk | Thread | Re: [xsl] The entity was referenced, Wendell Piez wapiez@ |
Re: [xsl] The entity was referenced, Michael Kay michaelk | Date | Re: [xsl] Making a lookup structure, rick@xxxxxxxxxxxxxx |
Month |