Subject: [xsl] accessing the input XML's doctype From: "James Sulak" <jsulak@xxxxxxxxxxxxxxxx> Date: Wed, 16 Jul 2008 14:40:17 -0500 |
Hello All, I'm trying to write a transform that gives the output XML file the same document type as the input XML file. (Specifically, it's a transform to remove Arbortext Editor's change-tracking markup). I'm not happy with the method I'm using now, namely regexing the input XML as an unparsed document to extract the public and system identifiers from the doctype declaration. I have a fairly limited knowledge of how a XSLT processor (we're using Saxon) interacts with the XML parser. But my understanding is that the parser reads in the XML, resolves any default attribute values, and then passes the document tree to the XSLT processor. The XSLT processor itself doesn't know or care about the doctype information. Is this correct? If it is, that would seem to imply that what I'm asking is impossible without writing an extension function. Is this the case? Since our implementation is already dependent on several Saxon extension functions, that's an acceptable solution. Has anyone attempted anything like this, or have any suggestions on how to proceed? Could I call Xerces (or another parser) from an extension function and get the public and system identifiers? Here's the relevant part of my current method: <xsl:param name="doctype.public" select="f:input-doctype(document-uri(.))[1]"/> <xsl:param name="doctype.system" select="f:input-doctype(document-uri(.))[2]"/> <xsl:function name="f:input-doctype"> <xsl:param name="document-uri"/> <xsl:variable name="unparsed-document" select="unparsed-text($document-uri)"/> <xsl:variable name="regex"> <xsl:text>DOCTYPE [\s]* ([a-zA-Z0-9]+) [\s]* PUBLIC [\s]* "(.+)" [\s]* "([0-9a-zA-Z/]+\.dtd)" </xsl:text> </xsl:variable> <xsl:analyze-string select="$unparsed-document" regex="{$regex}" flags="msx"> <xsl:matching-substring> <xsl:sequence select="regex-group(2), regex-group(3)"/> </xsl:matching-substring> </xsl:analyze-string> </xsl:function> <xsl:output method="xml" version="1.0" encoding="utf-8"/> <xsl:template match="/"> <xsl:result-document doctype-public="{$doctype.public}" doctype-system="{$doctype.system}"> <xsl:apply-templates/> </xsl:result-document> </xsl:template> Thanks, -James ----- James Sulak Electronic Publishing Developer Jones McClure Publishing
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Transforming xml to tex, Michael Obermeier | Thread | Re: [xsl] accessing the input XML's, Darcy Parker |
[xsl] Transforming xml to tex, Michael Obermeier | Date | RE: [xsl] Transforming xml to tex, Ryan Graham |
Month |