[xsl] collection() and uncommon file extensions

Subject: [xsl] collection() and uncommon file extensions
From: "Martin Holmes gtxxgm-xsl-list-2@xxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 15 Nov 2018 19:32:03 -0000
Hi all,

The recent changes to XPath (https://www.w3.org/TR/xpath-functions-31/#func-collection) have introduced the capability for the collection() function to retrieve non-XML documents as well as XML documents. However, that has broken some processes I have where XML documents with different extensions are being retrieved. For instance, where this:

collection('dir/?*.hocr')

used to happily retrieve and parse HOCR files (which are actually XHTML), Saxon now treats these files as xs:base64Binary items, and won't parse them, even though they have XML declarations.

I know that the recommended approach to dealing with this is to use a Saxon configuration file to register the file extension -- which I presume would be done like this:

<resources>
  <fileExtension extension="hocr" mediaType="text/xml"/>
</resources>

However, this doesn't seem to work for me -- do I have that syntax wrong?

Also, the conf file approach isn't easily portable, so I'm wondering if there are any plans to enable the media type to be specified on the collection() function itself, or to be registered in an XSLT document somehow?

Cheers,
Martin

Current Thread