Re: [jats-list] JPub3 Preview Stylesheets generating invalid XHTML

Subject: Re: [jats-list] JPub3 Preview Stylesheets generating invalid XHTML
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Fri, 7 Dec 2012 11:49:17 -0500
Hi Gerry,

You're correct: my article wouldn't have given you the solution.

Valid XHTML was never stipulated as a requirement for the preview
stylesheets. These are preview stylesheets, so "validating in the
application", i.e. opening and looking reasonable in the major
browsers, was considered to be sufficient; we didn't have a use case
requiring external validation of HTML outputs against an XHTML schema.
(If the docs don't say this, they should. But who reads the
documentation?)

In part, this is because (to be honest) meeting this requirement would
also have impeded other goals. We wanted the preview stylesheets (at
least the basic standalone HTML stylesheet) to be in XSLT 1.0 so that
it would work in the major browsers out of the box. Some requirements
for valid XHTML -- in particular, that paragraphs not include divs or
lists -- were not really achievable in XSLT 1.0 that was intended to
be extended and maintained without guru-level XSLT skills. (JATS of
course allows lists and some other block-level structures to appear
inside paragraphs, while valid XHTML doesn't. So a simple mapping of
element to element generates invalid results. The best XSLT 1.0
technique to get around this, sibling recursion, is something of a
bear. :-)

Anothing thing to bear in mind is that there could be a big difference
between addressing this problem for a particular system (making it
work for the data you have, then maintaining it for the new cases that
come along), and addressing it in the general case (accounting
comprehensively for all possible invalid XHTML outputs from the
preview stylesheet). Without doing the analysis I can't assess how
difficult the latter will actually be.

All this having been said, the requirement is certainly important for
many uses. If you're not happy with using Tidy (which is a good
expediency but also introduces new variables -- after all, the
requirement is actually "generate validate XHTML without butchering
the data", not just validity as such), I'd suggest applying an XHTML
remediation stylesheet in your pipeline, a post-process that will take
the invalid XHTML emitted by the basic preview stylesheet and fix it.
Use XSLT 2.0 to make the hard stuff more tractable.

It would of course do things like get rid of the old-fashoned @name
attributes (these were simply inherited from earlier versions of the
stylesheet and would have been fixed, had valid XHTML been a design
goal), split paragraphs around divs and lists, and so forth. (I'm
actually curious as to what Tidy will do about the latter issue -- I
guess I should try it and see.)

I hope this helps. Feel free to write me off list for any discussion
you don't feel would be welcome here.

Best regards,
Wendell

On Thu, Dec 6, 2012 at 11:17 AM, Gerry King
<g.king@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> <newbie alert /> I hadn't seen Wendell Piez's article on Fitting the Journal
> Publishing 3.0 Preview Stylesheets to your needs
> (http://www.ncbi.nlm.nih.gov/books/NBK47104/#piez-pipelining-methods) when I
> started my descent into hell however I don't think it would have provided a
> solution...
>
> Using the preview XSLT pipeline and then adding one last transform to create
> the desired xhtml fragment seemed easy enough. I thought I had it working.
>
> Unfortunately I had forgotten about the warnings from TextMate+HTMLTidy and
> spent the past week chasing my tail trying to work out why my XSL worked on
> a sample XHTML that had been tidied but failed to generate the desired
> output when I tried a batch <sigh>
>
> I was surprised that the output from jpub3-PMCcit-xhtml.xsl is invalid (my
> original  sample has 104 errors according to http://validator.w3.org/check).
> The source of my woes are <a href>s that have name attributes but not id's;
> my XSL uses these in key()s.
>
> <xsl:key name="figslist" match="//div[@class='fig panel']" use="concat('#',
> a[1]/@id)"/>
> <xsl:key name="tableslist" match="//div[starts-with(@class, 'table-wrap')]"
> use="concat('#', a[1]/@id)"/>
>
> Fixing the problem in the jpub3-PMCcit-xhtml.xsl pipeline is daunting so I
> guess I will use a Python script and pass the xhtml through HTMLTidy before
> running my XSL for now.
>
> I am surprised nobody else has had issues with the xhtml output before
> (searching this list before posting didn't find any hits). Are there any
> plans to make the tool generate valid xhtml?
>
> Gerry King
> Spandidos Publications
>



-- 
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^

Current Thread