Re: [xsl] Use XSLT to check a bunch of XHTML files forwell-formedness?

Subject: Re: [xsl] Use XSLT to check a bunch of XHTML files forwell-formedness?
From: "Roger L Costello costello@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 17 Feb 2021 16:51:52 -0000
Hi Folks,

Thank you for your recommendations on how to check a bunch of XHTML files for
well-formedness. Here's what I found:

1. I was unable to obtain an EXE for the xml parser that Richard Tobin
created, RXP. This page

http://www.cogsci.ed.ac.uk/~richard/rxp.html

has a link to an EXE of RXP:

ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp.exe

However, that link does not work.

Anyone know where I can get the EXE of RXP?

2. Next, I tried xmlwf. I discovered that you must first download and install
EXPAT:

https://libexpat.github.io/

That results in downloading: expat-win32bin-2.2.10.exe

Next, double click on it and expat will be installed on your system. Find the
folder where expat was installed. In there is a bin folder and in the bin
folder is xmlwf.exe

I ran xmlwf on a folder that contains 10,000 XHTML files. Wow! It checked all
of them in a couple seconds. However, the error messages are poor. For
example, here is one of the error messages:

	xhtml\htmloutput10.xhtml:206:2: mismatched tag

Compare that to the error message I get when I run my super-simple XSLT
program on the XHTML file:

Error on line 206 column 3 of htmloutput10.xhtml:
  SXXP0003  Error reported by XML parser: The element type "input" must be
terminated by the
  matching end-tag "</input>".

I find the latter error message to be more helpful.

Perhaps there is a flag that can be set in xmlwf to output more verbose/useful
error messages?

/Roger

-----Original Message-----
From: Liam R. E. Quin liam@xxxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Tuesday, February 16, 2021 8:52 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: [EXT] Re: [xsl] Use XSLT to check a bunch of XHTML files
forwell-formedness?

On Tue, 2021-02-16 at 21:42 +0000, Martin Honnen martin.honnen@xxxxxx
wrote:
> On 16.02.2021 22:10, Liam R. E. Quin liam@xxxxxxxxxxxxxxxxx wrote:
> > On Tue, 2021-02-16 at 21:04 +0000, Martin Honnen
> > martin.honnen@xxxxxx
> > wrote:
> > >
> > > In theory I think that should check with doc-available if the file
> > > is well-formed or not. Haven't tested however.
> >
> > It catches some problems, but will try to load the DTD.
>
> I thought Saxon has all the important W3C DTDs internalized.

It might, but last time i did this i was texting files with other DTDs,
including JATS (various different versions, too, each needing a different
catalogue file).

--
Liam Quin,B https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text
Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations: B http://www.fromoldbooks.org

Current Thread