Re: [xsl] Unparsed-text-available returns true for XML files, so how do I distinguish XML files?

Subject: Re: [xsl] Unparsed-text-available returns true for XML files, so how do I distinguish XML files?
From: Syd Bauman <Syd_Bauman@xxxxxxxxx>
Date: Sun, 3 Jan 2010 13:31:11 -0500
> I use base-uri(/) and the sub-string functions to distinguish file
> type.

While I suspect this answers the OP's question, as he referred to
file types in a way that indicates he is probably using the ultimate
part of the file name to indicate the file type, what base-uri()
really returns is the file's URI, and from that you can sub-string
its name and part of that is its extension.

But file extension and file type are *not* the same thing, although
many folks (and many pieces of software) keep them in synch by
convention. Just for starters, I have files with extensions of
  .xml
  .xhtml
  .rng
  .tei
  .odd
  .xsl
  .fo
  .xslt
  .dbk
that are all XML, and a few others that are only 1 automated step
away from being XML (.sgml, .rnc). Of course I can compare the
base-uri() to a list of recognizable extensions, but even this
has problems. There may be no extension, or it may be non-
standard, or it may be ambiguous. 

So, is there a way to ask "is this XML" without considering the
file's name? I don't mind if it involves parsing the contents --
after all, if it's not well-formed, it's not XML. I'm not sure what I
think of testing whether or not the document starts with an XML
declaration (i.e., "<?xml .."). In any given case, I probably
wouldn't mind the restriction "I won't recognize you're data as XML
unless it has an XML declaration", but in the general case, that
seems like a bad idea, since the declaration is officially optional.

Current Thread