[jats-list] Data URLs in JATS

Subject: [jats-list] Data URLs in JATS
From: "Maloney, Christopher (NIH/NLM/NCBI) [C] maloneyc@xxxxxxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 9 Mar 2016 21:51:05 -0000
Gareth Oakes wrote:

> Hi Chris,
>
> Interesting thought about the use of data URLs, is this something in active
use
> in the JATS community? (Ive not come across it outside of HTML yet)

No, it's just something I thought of just this morning. But the more I think
about it, the
more I like it, because it seems like a clean way to separate concerns:

1. JATS declares the high-level semantics of this object: "this is an
equation"
2. The URL is completely opaque to JATS, so JATS just acts as the the
transport.
3. It could work the same way for alternatives for equations or images,
  in any format: gif, jpg, svg, plain text, html, or even mathml. Of those,
only
  mathml is officially embeddable in JATS.

Yes, escaped markup, or even binary (with base-64, for example) would work
too,
but there's no standard way of declaring it's content type. Whereas, with data
urls,
you can use the IANA media type values to declare the type, and that's
standardized.

Also, it is working well with HTML, so there's a clear precedent.

> I guess there are two comments on that approach: (1) certain characters
will
> need escaping in attribute values;

Yes, the data would have to be escaped with URL-escaping (i.e. `%xx`).
I just checked my jsfiddle again, and see that
it is not valid XML, because it uses `<` signs inside the attribute value.
Here's another iteration, with everything escaped:
https://jsfiddle.net/klortho/tmk3rzse/2/.

> (2) Ive seen XML processors in the past
> that have fixed limits on the length of attribute values. Obviously such
XML
> processors should be fixed, but in production scenarios it is sometimes
> difficult to effect such changes. Just something to be aware of.

Either fixed, or the team should pick a new processor. I'd be very surprised
to
find any tool that couldn't handle long attribute values -- within reason of
course (under 1M, say).


I'd be interested to hear others' thoughts on this idea -- so I'm sending this
with a new heading. Its use wouldn't be covered by the JATS standard, so its
really a "recommended practices" question, and maybe people have some
practical concerns I haven't thought of.



> Maloney, Christopher  wrote:
>
> > There is another alternative: you could use data URLs. They are pretty
> > common on the web nowadays, often for CSS background images and that sort
> > of thing. I dont see any reason why they couldnt be used in JATS  they
> > are basically a way of embedding an external resource into a document.
> > Something like this:
> >
> >         <inline-formula>
> >           <alternatives>
> >             <inline-graphic xlink:href="data:text/html;utf8,<h1>The
> > Sun!</h1>"></inline-graphic>
> >           </alternatives>
> >         </inline-formula>
> >
> >
> > Renderers would have to know what to do with this, though, and it would
> > depend on the output format. Heres a jsfiddle showing data urls being
> > used in html, to include html and svg:
> > https://jsfiddle.net/klortho/tmk3rzse/
> >
> > The question of CDATA vs entity references is really a question about the
> > lexical layer of XML, and your XML tools and libraries should take care
of
> > that, *hopefully*. In my opinion, CDATA is a broken concept, and should
> > be avoided. The problem is that people tend to use it to produce XML
> > documents with tools that dont understand XML, and just write unescaped
> > markup into it, assuming it will parse. But problems ensue if the
> > unescaped markup itself contains CDATA, like this
> >
> > <textual-form><!<CDATA[
> >   Heres some unescaped markup: <!<CDATA[Happy gardens forever!]>
> > ]></textual-form>
> >
> >
> > It happens!
> >
> >
> > "Alexander Schwarzman aschwarzman@xxxxxxxxx" wrote:
> >
> > > An HTML fragment could be tagged with either <textual-form> or <code>
> > > -- and thus it would be nice if the Tag Library provided guidance on
> > > the use of <textual-form> vs. <code>, especially within
> > > <alternatives>. Also, whether it is <textual-form> or <code>, in order
> > > to represent angular brackets one could use escaped characters &lt;
> > > and &gt; or the CDATA section instead, as Gareth has suggested. The
> > > <code> examples in the Tag Library use the escaped characters, but it
> > > is unclear if the use of CDATA is deprecated or not.
> > >
> > > On Wed, Mar 9, 2016 at 4:28 AM, Peter Krautzberger wrote:
> > >
> > > > Hi Gareth,
> > > >
> > > > Thanks for the quick reply!
> > > >
> > > > Option 1) sounds good -- I didn't think of (ab)using it this way.
> > > >
> > > > Option 2) is good to know. I don't think it's necessary for me as
I'll
> > > > always have MathML (which the HTML is created from).
> > > >
> > > > Best regards,
> > > > Peter.
> > > >
> > > > Gareth Oakes  wrote:
> > > >
> > > > > Hi Peter,
> > > > >
> > > > > The JATS doctype doesnt include XHTML so definitely no way to
store
> > > > > HTML
> > > > > fragments as-is. You do have a number of options but it depends on
the
> > > > > various users of your data as to what makes sense. I see most
options
> > > > > as
> > > > > falling into one of two categories.
> > > > >
> > > > > 1. Most simply you wrap everything up as CDATA:
> > > > > <disp-formula><alternatives><textual-form><![CDATA[<span
> > > > > class="ABC">text</span>]]></textual-form></disp-formula>
> > > > >
> > > > > 2. Otherwise you translate the HTML to something JATS-y (carefully
> > > > > capturing all attributes):
> > > > > <disp-formula><alternatives><textual-form><styled-content
> > > > >
> > > > >
tyle-type="ABC">text</styled-content>]]></textual-form></disp-formula>
> > > > >
> > > > > First option is quick and easy. Second option lets you do more with
the
> > > > > content when it is in JATS format.
> > > > >
> > > > > I hope the thought process, at least, helps.
> > > > >
> > > > > > From: "Peter Krautzberger peter.krautzberger@xxxxxxxxxxx"
> > > > > >
> > > > > > Dear list members,
> > > > > >
> > > > > > I feel I have to apologize in advance. This is my first posting
and it
> > > > > > was
> > > > > > difficult to search the archives for such a generic-sounding
question.
> > > > > > I'm
> > > > > > sorry if I missed any earlier discussions on the topic.
> > > > > >
> > > > > > I'm wondering if there is any way to include (x)HTML-fragments in
a
> > > > > > JATS
> > > > > > document.
> > > > > >
> > > > > > More precisely, I'm looking to include such fragments as (an
> > > > > > alternative within) inline/display-formulas.
> > > > > >
> > > > > > The HTML fragments are just a number of nested <span> elements
with
> > > > > > typical HTML attributes (class, style, role. aria-label etc).
> > > > > >
> > > > > > I'm relatively certain that this is not possible (in a valid way)
but I
> > > > > > wanted to make sure I didn't miss anything.
> > > > > >
> > > > > > Thanks in advance for any pointers!
> > > > > >
> > > > > > Best regards,
> > > > > > Peter Krautzberger.

Chris Maloney

Current Thread