Subject: Re: [xsl] accessing the input XML's doctype From: "Darcy Parker" <darcyparker@xxxxxxxxx> Date: Thu, 17 Jul 2008 11:07:14 -0400 |
Can anyone point to a modified XML parser that works with saxon that is similar to the one in the article? http://www.xml.com/pub/a/2000/08/09/xslt/xslt.html It seems like the modified XML parser would be a good solution and that it would be of general interest to a wide audience. So I am hoping that someone has already created one, compiled it and has chosen to share it freely on the Internet, with instructions on how to use it with saxon. Unfortunately I can't find one like the article mentions. Or is the custom SAX filter as Michael suggested a better approach? Darcy On Thu, Jul 17, 2008 at 10:50 AM, James Sulak <jsulak@xxxxxxxxxxxxxxxx> wrote: > > Thanks everyone for your response. > > Darcy - Fortunately, I have the meat of the transform working (accepting > splits and joins, too). The article looks interesting. > > David - I like the idea of default attributes, but ideally I want the > transform to be truly universal. Maybe the transform could first check > for those attribute, and if they doesn't exist, use my current > plain-text parsing method. > > Michael - Writing a custom SAX filter is a bit beyond my current > abilities, would be a good learning project when I have time. > > If I ever get anything more sophisticated or elegant working, I'll post > it to the list. > > Thanks, > > -James > > > > > > -----Original Message----- > From: Michael Kay [mailto:mike@xxxxxxxxxxxx] > Sent: Wednesday, July 16, 2008 6:08 PM > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: RE: [xsl] accessing the input XML's doctype > > One thing you could try doing - I've had it in mind for years - is to > write > a filter between the XML parser and the XSLT processor, using SAX > interfaces, that gets notification of the DTD events from the parser and > translates them into things the XSLT processor understands, like > elements > and attributes in some special namespace. > > This seems much cleaner architecturally than reading the document as > unparsed text and trying to parse it yourself. > > Michael Kay > http://www.saxonica.com/ > > > -----Original Message----- > > From: James Sulak [mailto:jsulak@xxxxxxxxxxxxxxxx] > > Sent: 16 July 2008 20:40 > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > > Subject: [xsl] accessing the input XML's doctype > > > > Hello All, > > > > I'm trying to write a transform that gives the output XML > > file the same document type as the input XML file. > > (Specifically, it's a transform to remove Arbortext Editor's > > change-tracking markup). I'm not happy with the method I'm > > using now, namely regexing the input XML as an unparsed > > document to extract the public and system identifiers from > > the doctype declaration. > > > > I have a fairly limited knowledge of how a XSLT processor (we're using > > Saxon) interacts with the XML parser. But my understanding > > is that the parser reads in the XML, resolves any default > > attribute values, and then passes the document tree to the > > XSLT processor. The XSLT processor itself doesn't know or > > care about the doctype information. Is this correct? > > > > If it is, that would seem to imply that what I'm asking is > > impossible without writing an extension function. Is this > > the case? Since our implementation is already dependent on > > several Saxon extension functions, that's an acceptable > > solution. Has anyone attempted anything like this, or have > > any suggestions on how to proceed? Could I call Xerces (or > > another parser) from an extension function and get the public > > and system identifiers? > > > > Here's the relevant part of my current method: > > > > <xsl:param name="doctype.public" > > select="f:input-doctype(document-uri(.))[1]"/> > > <xsl:param name="doctype.system" > > select="f:input-doctype(document-uri(.))[2]"/> > > > > <xsl:function name="f:input-doctype"> > > <xsl:param name="document-uri"/> > > <xsl:variable name="unparsed-document" > > select="unparsed-text($document-uri)"/> > > <xsl:variable name="regex"> > > <xsl:text>DOCTYPE > > [\s]* > > ([a-zA-Z0-9]+) > > [\s]* > > PUBLIC > > [\s]* > > "(.+)" > > [\s]* > > "([0-9a-zA-Z/]+\.dtd)" > > </xsl:text> > > </xsl:variable> > > <xsl:analyze-string select="$unparsed-document" regex="{$regex}" > > flags="msx"> > > <xsl:matching-substring> > > <xsl:sequence select="regex-group(2), regex-group(3)"/> > > </xsl:matching-substring> > > </xsl:analyze-string> > > </xsl:function> > > > > <xsl:output method="xml" version="1.0" encoding="utf-8"/> > > > > <xsl:template match="/"> > > <xsl:result-document doctype-public="{$doctype.public}" > > doctype-system="{$doctype.system}"> > > <xsl:apply-templates/> > > </xsl:result-document> > > </xsl:template> > > > > > > Thanks, > > > > -James > > > > > > ----- > > James Sulak > > Electronic Publishing Developer > > Jones McClure Publishing
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] accessing the input XML's, James Sulak | Thread | RE: [xsl] accessing the input XML's, Michael Kay |
RE: [xsl] accessing the input XML's, James Sulak | Date | Re: [xsl] Formatting issue with HTM, Senthilkumaravelan K |
Month |