Re: [xsl] The entity was referenced, but not declared.

Subject: Re: [xsl] The entity was referenced, but not declared.
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 14 Jun 2023 11:56:49 -0000
Hello XSL-List,

If the source files in question contain HTML named entities there is a fair
chance they also have a DOCTYPE declaration, and depending on what that is,
it might be possible to use it to provide a set of entity declarations to
the parser as a kind of "DTD stub", enabling the parse.

But I like the idea of using an HTML parse first to normalize. Not only is
there DC's XSLT Tag Soup parser as Martin mentioned, there are also not a
few libraries with Tag Soup parsers. Resolving the entities can be
considered as a discrete preparation process (i.e. a 'process').

Cheers, Wendell


On Tue, Jun 13, 2023 at 2:36b/AM Martin Honnen martin.honnen@xxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>
> On 6/13/2023 12:48 AM, Manuel Souto Pico terminolator@xxxxxxxxx wrote:
> >
> >
> > I'm trying to convert a collection of XLIFF files into TMX. The files
> > contain some HTML named entities, which makes my stylesheet choke:
> >
> >
> >
> > My question is: Is there any way I can avoid or fix this problem from
> > the XSLT stylesheet without having to modify the input XLIFF files?
> >
> > The example above is with ndash but I believe there must be many HTM
> > named entities in the files.
> >
>
> David Carlisle wrote an HTML tag soup parser in XSLT 2
> (
> https://github.com/davidcarlisle/web-xslt/blob/main/htmlparse/htmlparse.xsl
> )
> that knows all the named entities and can also be used as an XML parser
> knowing those entities so if you use/import his stylesheet and use its
> function instead of normal XML parsing, as in
>
>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>    version="3.0"
>    xmlns:xs="http://www.w3.org/2001/XMLSchema";
>    xmlns:d="data:,dpc"
>    exclude-result-prefixes="#all"
>    expand-text="yes">
>
>    <xsl:import
> href="
>
https://raw.githubusercontent.com/davidcarlisle/web-xslt/main/htmlparse/htmlp
arse.xsl
> "/>
>
>    <xsl:param name="xml-uri" as="xs:string" select="'sample1.xml'"/>
>
>    <xsl:mode on-no-match="shallow-copy"/>
>
>    <xsl:template name="xsl:initial-template">
>      <xsl:apply-templates select="unparsed-text($xml-uri) =>
> d:htmlparse('', false())"/>
>    </xsl:template>
>
> </xsl:stylesheet>
>
> the named entity references should be parsed into the corresponding
> characters (and you can process all nodes by adding any templates you
> need/have/want to transform the XML). So the above assumes starting e.g.
> Saxon 9.8 or later with `-it` for the initial template.
>
>
>

--
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...

Current Thread