Re: [jats-list] html fragments and JATS

Subject: Re: [jats-list] html fragments and JATS
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 23 Mar 2016 20:50:35 -0000
Hi,

I think there's a difference between embedding HTML fragments as
'functional' code i.e. something that's expected to be processed
downstream, and presenting HTML code-as-code, for example if you were
writing, in JATS, a book about HTML.

Chris's suggestion is a good solution to the former problem -- which
is considerably the more difficult one, for a number of reasons (on
which I am tempted to discourse). Of course, note that while he
succeeds in representing the HTML, what happens to it in processing is
a different matter.

If all you have to do is expect to present the code as string
literals, then CDATA is much more thinkable. So something like

<code language="html" code-type="sgml"><![CDATA[ <html>Boo!</html> ]]></code>

(I know that HTML ain't valid, or is it. ;-)

NB you can also escape the markup using character entity references if
you prefer instead of using CDATA, which is apt to confuse people.
(But then so are character references and even the fact that we have
to escape it somehow, so choose your poison.)

I wouldn't recommend the escape-the-markup "hide it in CDATA" trick as
a solution to the first problem. The method has been tried, and found
wanting - it is brittle and apt to break at bad times - typically when
the knowledge necessary to correct it is far away.

Now it could be that what is actually wanted is something in between
-- all the information in the HTML, including what is signified by the
markup. except with all the advantages of being in a JATS system. :->

This means either a hybrid HTML/JATS system as above (in which case
you must control the boundary somehow plus maintain both wings), or a
transformation from the HTML (recognizing whatever 'semantics' you
think it reliably has) into JATS.

Both of these are possible, subject to some definition of (and
probably controls on) what you expect from the "HTML".

My $0.02, as always interested in others' experience.

Cheers, Wendell



On Wed, Mar 9, 2016 at 4:01 PM, Alexander Schwarzman
aschwarzman@xxxxxxxxx <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
> Hi, Gareth:
>
> Maybe I wasn't clear enough: what I wanted the clarification on was
> whether or not it is a good idea to use the <!CDATA[[]]> construct
> within the <code> and <textual-form> elements.
>
> Of course, the Tag Library does not endorse best practices, so this
> may be a question for the JATS4R group.
>
> Alexander ('Sasha') Schwarzman
>
> On Wed, Mar 9, 2016 at 3:35 PM, Gareth Oakes goakes@xxxxxxx
> <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Hi Alexander,
>>
>>
>>
>> Just a comment regards CDATA. The <!CDATA[[]]> construct is a core feature
of XML 1.0. How can it be deprecated? Or do you mean its use should be
discouraged for JATS? Ibd have to argue against that; CDATA is functionally
identical to escaping all markup characters in PCDATA and can be a nice
convenience (e.g. for capturing source code).
>>
>>
>>
>> I do understand that CDATA has been misused in the past. Almost every
junior XML developer is tempted at one point or another to wrap up vast
swathes of markup as unparsed CDATA text. Perhaps guidance is needed on use vs
abuse!
>>
>>
>>
>> // Gareth Oakes
>>
>> // Chief Architect, GPSL
>>
>> // www.gpsl.co
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 10/03/2016, 00:05, "Alexander Schwarzman aschwarzman@xxxxxxxxx"
<jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>
>>
>>>An HTML fragment could be tagged with either <textual-form> or <code>
>>
>>>-- and thus it would be nice if the Tag Library provided guidance on
>>
>>>the use of <textual-form> vs. <code>, especially within
>>
>>><alternatives>. Also, whether it is <textual-form> or <code>, in order
>>
>>>to represent angular brackets one could use escaped characters &lt;
>>
>>>and &gt; or the CDATA section instead, as Gareth has suggested. The
>>
>>><code> examples in the Tag Library use the escaped characters, but it
>>
>>>is unclear if the use of CDATA is deprecated or not.
>>
>>>
>>
>>>Alexander ('Sasha') Schwarzman, Content Technology Architect
>>
>>>phone: +1.202.416.1979 | e-mail: aschwarzman@xxxxxxx
>>
>>>
>>
>>>The Optical Society (OSA)
>>
>>>2010 Massachusetts Ave., NW
>>
>>>Washington, DC 20036 USA
>>
>>>www.osa.org
>>
>>>
>>
>>>On Wed, Mar 9, 2016 at 4:28 AM, Peter Krautzberger
>>
>>>peter.krautzberger@xxxxxxxxxxx
>>
>>><jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>>> Hi Gareth,
>>
>>>>
>>
>>>> Thanks for the quick reply!
>>
>>>>
>>
>>>> Option 1) sounds good -- I didn't think of (ab)using it this way.
>>
>>>>
>>
>>>> Option 2) is good to know. I don't think it's necessary for me as I'll
>>
>>>> always have MathML (which the HTML is created from).
>>
>>>>
>>
>>>> Best regards,
>>
>>>> Peter.
>>
>>>>
>>
>>>> On Wed, Mar 9, 2016 at 10:23 AM, Gareth Oakes goakes@xxxxxxx
>>
>>>> <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>>>>
>>
>>>>> Hi Peter,
>>
>>>>>
>>
>>>>> The JATS doctype doesnbt include XHTML so definitely no way to store
HTML
>>
>>>>> fragments as-is. You do have a number of options but it depends on the
>>
>>>>> various users of your data as to what makes sense. I see most options
as
>>
>>>>> falling into one of two categories.
>>
>>>>>
>>
>>>>> 1. Most simply you wrap everything up as CDATA:
>>
>>>>> <disp-formula><alternatives><textual-form><![CDATA[<span
>>
>>>>> class="ABC">text</span>]]></textual-form>b&</disp-formula>
>>
>>>>>
>>
>>>>> 2. Otherwise you translate the HTML to something JATS-y (carefully
>>
>>>>> capturing all attributes):
>>
>>>>> <disp-formula><alternatives><textual-form><styled-content
>>
>>>>>
style-type="ABC">text</styled-content>]]></textual-form>b&</disp-formula>
>>
>>>>>
>>
>>>>> First option is quick and easy. Second option lets you do more with the
>>
>>>>> content when it is in JATS format.
>>
>>>>>
>>
>>>>> I hope the thought process, at least, helps.
>>
>>>>>
>>
>>>>> // Gareth Oakes
>>
>>>>> // Chief Architect, GPSL
>>
>>>>> // www.gpsl.co
>>
>>>>>
>>
>>>>> From: "Peter Krautzberger peter.krautzberger@xxxxxxxxxxx"
>>
>>>>> <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
>>
>>>>> Reply-To: "jats-list@xxxxxxxxxxxxxxxxxxxxxx"
>>
>>>>> <jats-list@xxxxxxxxxxxxxxxxxxxxxx>
>>
>>>>> Date: Wednesday, 9 March 2016 at 19:00
>>
>>>>> To: "jats-list@xxxxxxxxxxxxxxxxxxxxxx"
<jats-list@xxxxxxxxxxxxxxxxxxxxxx>
>>
>>>>> Subject: [jats-list] html fragments and JATS
>>
>>>>>
>>
>>>>> Dear list members,
>>
>>>>>
>>
>>>>> I feel I have to apologize in advance. This is my first posting and it
was
>>
>>>>> difficult to search the archives for such a generic-sounding question.
I'm
>>
>>>>> sorry if I missed any earlier discussions on the topic.
>>
>>>>>
>>
>>>>> I'm wondering if there is any way to include (x)HTML-fragments in a
JATS
>>
>>>>> document.
>>
>>>>>
>>
>>>>> More precisely, I'm looking to include such fragments as (an
alternative
>>
>>>>> within) inline/display-formulas.
>>
>>>>>
>>
>>>>> The HTML fragments are just a number of nested <span> elements with
>>
>>>>> typical HTML attributes (class, style, role. aria-label etc).
>>
>>>>>
>>
>>>>> I'm relatively certain that this is not possible (in a valid way) but I
>>
>>>>> wanted to make sure I didn't miss anything.
>>
>>>>>
>>
>>>>> Thanks in advance for any pointers!
>>
>>>>>
>>
>>>>> Best regards,
>>
>>>>> Peter Krautzberger.
>>
>>>>> JATS-List info and archive
>>
>>>>> EasyUnsubscribe (by email)
>>
>>>>
>>
>>>>
>>
>>>> JATS-List info and archive
>>
>>>> EasyUnsubscribe (by email)
>>
>>>
>>
>



--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^

Current Thread