[xsl] Re: Turning escaped mixed content back to XML

Subject: [xsl] Re: Turning escaped mixed content back to XML
From: Martin Holmes <mholmes@xxxxxxx>
Date: Fri, 28 Mar 2014 14:06:30 -0700
I spoke too soon. Passing this:

contains a single TEI-conformant document, comprising a TEI header and a text, either in isolation or as part of a &lt;gi&gt;teiCorpus&lt;/gi&gt; element.

into parse-xml-fragment() gets this fatal error:

FODC0006: First argument to parse-xml-fragment() is not a well-formed and namespace-well-formed XML fragment. XML parser reported: I/O error reported by XML parser processing file:/home/mholmes/Documents/tei/council/translation/new_translations_into_specs.xsl: 404 Not Found for: http://www.saxonica.com/parse-xml-fragment/actual.xml

This is with Saxon 9.1.5.3 PE.

I must be missing something here. The default namespace is tei, the xpath-default-namespace is tei, and all the other namespaces have defined prefixes (tei has tei: too).

Cheers,
Martin

On 14-03-28 12:09 PM, Martin Holmes wrote:
That's what I needed: parse-xml-fragment(). This seems to work:

<xsl:template match="text:p" exclude-result-prefixes="#all">

  <!--       <xsl:variable name="unparsed" select="concat('&lt;p&gt;',
string-join(//text(), ''), '&lt;/p&gt;')"/>
         <xsl:variable name="parsed" select="saxon:parse($unparsed)"/>
          <xsl:copy-of select="$parsed" exclude-result-prefixes="#all"/>-->
         <xsl:if test="string-length(.) gt 0">
         <tei:p>
             <xsl:value-of
select="parse-xml-fragment(string-join(//text(), ''))"/>
             </tei:p></xsl:if>
     </xsl:template>

for most cases. I do have some horrible edge-cases though:

<text:p>a start-tag, with delimiters &lt; and &gt; is intended</text:p>

I should be able to pre-process the input text for angle brackets in the
context of spaces and swap them out for something else temporarily though.

Thanks,
Martin

On 14-03-28 11:35 AM, Martin Honnen wrote:
Martin Holmes wrote:

I'm trying to process an ODS spreadsheet which has <text:p> nodes which
contain embedded mixed-content markup in escaped form:

<text:p>indicates the amount by which this zone has been rotated
clockwise, with respect to the normal orientation of the parent
&lt;gi&gt;surface&lt;/gi&gt; element as implied by the dimensions given
in the &lt;gi&gt;msDesc&lt;/gi&gt; element or by the coordinates of the
&lt;gi&gt;surface&lt;/gi&gt; itself. The orientation is expressed in arc
degrees.</text:p>

I need to turn this back into parsed XML for insertion into XML
documents. I'm using Saxon 9.4 with XSLT 2 (and I can use 3 if
necessary).

I tried


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
   xmlns:text="http://example.com";
   xmlns:tei="http://example.com/tei";
   version="3.0">

<xsl:template match="text:p">
   <tei:p>
     <xsl:copy-of select="parse-xml-fragment(.)"/>
   </tei:p>
</xsl:template>

</xsl:stylesheet>

with Saxon 9.5 PE and got


<?xml version="1.0" encoding="UTF-8"?><tei:p xmlns:text="http://example.com"; xmlns:tei="http://example.com/tei";>indicate s the amount by which this zone has been rotated clockwise, with respect to the normal orientation of the parent <gi>sur face</gi> element as implied by the dimensions given in the <gi>msDesc</gi> element or by the coordinates of the <gi>sur face</gi> itself. The orientation is expressed in arc degrees.</tei:p>

That has XML elements and not escaped markup so should do, you will need
to change the namespaces and maybe use exclude-result-prefixes.

Current Thread