Re: [jats-list] [ANN] HoBoTS: a Relax NG based BITS customization

Subject: Re: [jats-list] [ANN] HoBoTS: a Relax NG based BITS customization
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Sat, 09 Mar 2013 21:59:26 +0100
(replying to the list, with Nikosb consent)

Hi Nikos,

Thanks for your comments. The changes in HoBoTS with respect to BITS reflect some of the (rather minuscule) pain points that Hogrefe and I had with BITS, as expressed in my message to this list on Jan. 16.:
http://www.biglist.com/lists/lists.mulberrytech.com/jats-list/archives/201301/msg00029.html


The changes particularly address the following points:

- allow nested tables (this is more a side effect of the following point)

- allow block-level content ('para-level-minus-x') in td, as an alternative to inline content (I discussed this in the message cited above)

- support semi-generated ToCs (allow ToCs that only have a title and a depth attribute, see the other message)

Besides that, we allow formatting (CSSa) and semantic (RDFa) markup in attributes.

RDFa is intended for marking up multiple-choice tests and the like. Webd prefer not to use JATS-style content-type attributes for that because RDFa is more expressive and we can use the same vocabulary in HoBoTS and in HTML (where webll probably have Javascript widgets that turn RDFa-enriched books into interactive test applications).

We use CSSa for conveying the original InDesign style information and local formatting overrides, after translating them into CSS properties and attaching them as XML attributes. One reason for using CSSa is that we want to be able to express formatting that should appear in the same way in every rendering. For example, we can just pass thru the properties for handwriting fonts b css:font-family="cursive", extra spacing after paragraphs b css:margin-bottom="12pt", font color b css:color="device-cmyk(0,1,1,0)", or table cell backgrounds b css:background-color="device-cmyk(0,0,0,0.2)". They will find their way to the HTML and EPUB renderings almost unaltered b except that theybll translate into CSS rules or HTML style attributes, and except that color values will be converted to RGB.

The good thing about CSSa is that you can start converting your typeset or manuscript data to BITS quickly, without the need to define mappings for every kind of formatting that may occur, and without the need to define mappings from your content-type or style-type attributes to some CSS for rendering. You just pass it thru.

You can later refine the conversion by recognizing some formatting as semantically significant and then up-converting the XML that you already have within the same schema (for example, styled-content with css attributes b named-content without css attributes).


In most cases, a HoBoTS document can easily be transformed into a BITS document:


If you have paragraphs in table cells, you can unwrap their contents and place a break in between. You wonbt be able to preserve indentation and vertical spacing, but this is acceptable.

If you have tables in table cells, you can wrap them in named-content.

If you have a toc like that (which is permitted in HoBoTS)
<toc depth="3">
  <title-group>
    <title>Inhaltsverzeichnis</title>
  </title-group>
</toc>
you can render the headings to a full-blown BITS toc, or you can remove it.

You can remove all CSSa and RDFa attributes (maybe after mapping them to appropriate *-type attributes).

So there will be some rather simple XSLT that will transform HoBoTS into BITS, should the need arise. A high degree of BITS compatibility was one of HoBoTSb design aims (without sacrificing compatibility with the content structure of Hogrefebs books and with their initial strategy of combining conventional typesetting with a sophisticated checking/conversion infrastructure).


Let me finally make another remark regarding consistent naming and structure. When writing XSLT that converts BITS to HTML, I found it (unnecessarily?) that I frequently had to distinguish cases: Is it a content division whose title is in a title group or just a plain title? How many variants of title groups are there (book-title-group, book-part-meta/title-group, b&). How many different body elements are there (named-book-part-body, book-part/body, book-body)? How many type attribute names are there (book-part-type, style-type, content-type, b&).


I think book-parts, prefaces, etc. may structurally be the same as sections (sec may also carry metadata, alt-titles, etc.). I like the DocBook 5 approach of allowing a metadata block with the uniform name info on every document-structure element (and also on paragraphs). The only thing that isnbt that straightforward in DocBook is that an elementbs title is allowed either standalone or within an info block.

And people who develop XSLT conversions, who explore documents via XPath or who select from a collection using XQuery will all benefit if the number of *-type attribute names is radically reduced.

Why did the schema designers opt for this kind of redundancy? If youbre in a styled-content element, there is only one permitted *-type attribute. Why not call it type instead of style-type?

It just came to my mind that some of these naming and content model decisions may be due to limitations of the original schema language (DTD). But I think DTD allows more uniform metadata modeling and naming than currently found in BITS.


After this final remark, another one regarding schema languages: I chose RNG for extending, restricting, and redefining parts of the original content model. This was particularly convenient when I dicovered that there are no hooks for allowing RDFa and CSSa. I wrote an XSLT that enhanced the attlists of all elements that previously were allowed to carry xml:lang, abbr, or display-as attributes with something like this:
<define name="th-attlist" combine="interleave">
<ref name="css_attributes"/>
<ref name="Rdfa.attrib"/>
</define>
Of course I could have patched the DTD itself by inserting a placeholder parameter entity for additional global attributes, or I could have gone to the committee and have asked them to include such a global attributes parameter entity in first place.


But one of the beauties of RNG is that I didnbt have to do string processing of the DTD or some kind of DTDbXML transformation first. I could use XSLT (and itbs simpler than patching XSD, btw). Another beauty of RNG is that b apart from an automatic trang DTDbRNG conversion b I didnbt have to touch the original schema in any way, even though it lacked the extension hooks that I needed.

But what was intended as a brief message that mostly refers to my other post has become quite lengthy b to the readers that bore with me until here: thanks for your patience.

Gerrit





On 08.03.2013 18:07, Nikos Markantonatos wrote:
Hi Gerritt.

Thanks for this useful pointer. Have you considered submitting some or
even all of your suggested extensions to the BITS reviewing committee?
What is typically required is a brief description of each extension, the
reason that prompted you to adopt it, in what way it makes your book
encoding better and a description of what content or application may
benefit from each extension.

If you think you have a use case which others may benefit from, you
should probably suggest and it is possible that some of these extensions
may find themselves in one form or another in the BITS standard soon.
There is a reviewing process for such extensions suggested over the past
five months taking place later this month. This is a good opportunity to
contribute your extensions should you wish so.

Best regards,
Nikos Markantonatos
Atypon


On 03/08/2013 06:13 PM, Imsieke, Gerrit, le-tex wrote:
Dear List,

Webve developed a BITS customization for the Hogrefe group of
publishers. Hogrefe agreed that we make this customization publicly
available (the main ingredients are free and open anyway).

We converted BITS 0.2 to Relax NG and enriched it with RDFa and CSSa b
CSS as XML attributes, see
http://archive.xmlprague.cz/2013/presentations/Conveying_Layout_Information_with_CSSa/CSSa_xmlprague_gimsieke.html#/step-1



Therebs some documentation in the schema,
http://hobots.hogrefe.com/schema/hobots.rng (just view it in a browser).

Therebs a small sample document that somehow gives a hint as to how CSSa
comes into play: http://hobots.hogrefe.com/schema/hobots_sample.xml
You may open it in oXygen and should immediately see a validation error
against CSSa and against an embedded Schematron rule of hobots.rng.

The schema files and the sample files are included in a zip file,
http://hobots.hogrefe.com/schema/hobots.zip

We might eventually move the schema to a part of Hogrefebs svn repo that
is publicly readable, or move it to github.

Ibm looking forward to your feedback.

Gerrit




-- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard VC6ckler

Current Thread