RE: [jats-list] Markup for linguistics (glossed text)

Subject: RE: [jats-list] Markup for linguistics (glossed text)
From: "Maloney, Christopher (NIH/NLM/NCBI) [C]" <maloneyc@xxxxxxxxxxxxxxxx>
Date: Fri, 22 Nov 2013 21:14:05 +0000
I would suggest ruby over a custom markup schema, following the principle,
"don't reinvent the wheel".  :)

Ruby is an established standard, designed
specifically for use-cases very similar to this one.  And, IMO, it's still a
good idea to separate semantics from presentation -- that's why I think it
would be preferable over tables (even if you could get close to the
presentation you want with tables).  A <ruby> element with a @specific-use
attribute identifying this as what it is (What is it?  linguistic-grammatical
annotations of some sort) would be a good way (again, IMO) to mark this up in
the article XML.  

Presentation is another matter, of course, as I said
before.  For presentation on a web page, you could wrap each "chunk" in a div
that is displayed inline-block, here's a quick test page that get's pretty
close:
http://www.ncbi.nlm.nih.gov/staff/maloneyc/linguistic-annotation/try.html.
(Make the browser window narrow to see each chunk wrap.)

Cheers!
Chris
Maloney


________________________________________
From: Wendell Piez
[wapiez@xxxxxxxxxxxxxxx]
Sent: Friday, November 22, 2013 3:20 PM
To:
jats-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [jats-list] Markup for
linguistics (glossed text)

Hi again,

Also, I'd prefer plain-old tables
(however ornate) to 'ruby' following
the "Principle of Least Surprise".
Cheers, Wendell

Wendell Piez | http://www.wendellpiez.com
XML | XSLT |
electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


On Fri, Nov 22, 2013 at 2:56 PM,
Wendell Piez <wapiez@xxxxxxxxxxxxxxx> wrote:
> Hi,
>
> My nominations for
alternatives:
>
> (1) If there are a lot of these, and real benefit to be
gained, then
> design and use a little markup language for them. Then, format
as you
> like, probably via tables.
>
> Disadvantage: time and expertise
required. Dependence on specialists'
> knowhow. (But that could be an
advantage.)
>
> (2) Custom-designed tables, validated via Schematron. JATS
provides
> @content-type
> Just as much work, and you'd be doing all the same
work as (1), but
> they could be made to validate as JATS without extending
it.
>
> Advantage: relatively quick and dirty to get something started.
>
Disadvantage: the XML would be relatively hard to maintain compared to
> (1).
Also, this is schema design without a schema, so relatively
> fragile and not
scalable to complexity.
>
> (Such a table could also be used to represent (1)
in JATS when
> interfacing with JATS-based systems.)
>
> (3) SVG. Similar
disadvantages, many advantages of its own. They could
> be very pretty. :-)
>
> It sounds like graphics made from SVGs might be the preferred choice
> of
your vendor (and I don't blame them). But as Debbie points out,
> they're not
searchable. (If the SVGs were available they'd be sort of
> searchable.)
>
>
What my choice would be would depend on my goals, long-term and
> short-term
resources, and the frequency with which it occurs or number
> of them. Having
a finite number of these things (i.e. I'd never expect
> to see more of these
than I already have) or having them very
> infrequently would argue for (2) or
(3). The more of these there are
> and the more interesting/important the
semantics they could expose,
> the more I'd do (1).
>
> Designing and
specifying a well-controlled, clean descriptive format
> (1) would also be
really fun. (2) and (3) are also natural spin-offs
> for (1), not exclusive of
it -- although you could also skip to them
> directly (and specialists in CSS
and SVG might prefer to do so).
>
> Cheers, Wendell
>
>
>
>
>
> Wendell Piez |
http://www.wendellpiez.com
> XML | XSLT | electronic publishing
> Eat Your
Vegetables
> _____oo_________o_o___ooooo____ooooooo_^
>
>
> On Thu, Nov 21,
2013 at 5:01 PM, Michael Boudreau
> <mboudreau@xxxxxxxxxxxxxxxxxx> wrote:
>>
For what it's worth, our hosting platform informs me that the only way to
>>
get these images to display at a consistent size is to submit the
>> <graphic>
element as a child of <disp-formula>. They were not sympathetic
>> to my
pointing out that these are not math.
>>
>> --
>> Michael R. Boudreau
>>
Electronic Publishing Technology Manager
>> The University of Chicago Press
>>
1427 E. 60th Street
>> Chicago, IL 60637
>> (773) 753-3298
>>
www.journals.uchicago.edu
>>
>>
>>
>>
>>
>> On 11/20/13, 10:56 AM, "Michael
Boudreau" <mboudreau@xxxxxxxxxxxxxxxxxx>
>> wrote:
>>
>>>Thanks, everyone, for
these comments. I should have mentioned that we're
>>>currently using
graphics, like so (highly simplified):
>>>
>>>   <p>Some text precedes an
example:</p>
>>>   <p><graphic href="example1.tiff"/></p>
>>>   <p>And the
text continues.</p>
>>>
>>>This can be converted by our host to a readable
HTML presentation. The
>>>down-side is that the content of the graphic is not
searchable by the
>>>user's browser (though the site's search engine can build
its index from
>>>the PDF version, so all is not lost), and the graphic's
visual quality is
>>>relatively low, particularly on mobile devices.
>>>
>>>To
answer Nikos's question, I don't have a current project that requires a
>>>particular type of markup for such examples, but the examples in their
>>>context just don't strike me as "tabular"--but I'm not a linguist and
>>>would defer to the journal editors if they deemed table markup
>>>appropriate. I think <ruby> is closer to the mark; I'd have to do
>>>extensive testing to see if it could handle examples with multiple layers
>>>of glossing on the base text (sometimes there are 2 or 3 or more). (I
>>>tremble to think what it would take to train our typesetting vendors to
>>>apply either <table> or <ruby> markup to these examples.)
>>>
>>>I hadn't
thought of <array>, which actually might help solve a processing
>>>problem on
our vendor's side even while still using <graphic>.
>>>
>>>
>>>--
>>>Michael
R. Boudreau
>>>Electronic Publishing Technology Manager
>>>The University of
Chicago Press
>>>1427 E. 60th Street
>>>Chicago, IL 60637
>>>(773) 753-3298
>>>www.journals.uchicago.edu
>>>
>>>
>>>
>>>
>>>
>>>On 11/20/13, 9:14 AM,
"Alexander Schwarzman" <aschwarzman@xxxxxxxxx>
>>>wrote:
>>>
>>>>Or, perhaps,
use <array>, with either <graphic>, as Nikos suggested,
>>>>or with <tbody>
inside...
>>>>
>>>>--Sasha
>>>>
>>>>Alexander ('Sasha') Schwarzman, Content
Technology Architect
>>>>phone: +1.202.416.1979 | e-mail: aschwarzman@xxxxxxx
>>>>
>>>>The Optical Society (OSA)
>>>>2010 Massachusetts Ave., NW
>>>>Washington, DC 20036 USA
>>>>www.osa.org
>>>>
>>>>
>>>>On Wed, Nov 20,
2013 at 5:01 AM, Nikos Markantonatos <nikos@xxxxxxxxxx>
>>>>wrote:
>>>>> Hi
Michael,
>>>>>
>>>>> The question that arises of course out of the
"semantically reasonable"
>>>>> encoding of such difficult pieces of text is
why you need it. Are you
>>>>> planning to draw some logic across different
types of such linguistic
>>>>> representations? In that case, JATS alone will
hardly offer you a
>>>>>solution.
>>>>> JATS often resorts to other known
standards for the representation of
>>>>> "tough" textual pieces, such as
mathematical equations (MathML) and
>>>>>tables
>>>>> (XHTML, OASIS). If there
was a corresponding XML encoding standard for
>>>>> linguistic
representations, one could make the case for embedding it
>>>>>into
>>>>>
JATS.
>>>>>
>>>>> Otherwise, you are left to choose between the encoding
options
>>>>>suggested by
>>>>> Debbie, or to capture it as an image (my
favorite option), or even
>>>>>attempt
>>>>> to represent it in TeX/LaTeX or
MathML.
>>>>>
>>>>> Best regards,
>>>>> Nikos Markantonatos
>>>>> Atypon
>>>>>
>>>>>
>>>>> On 11/19/2013 11:47 PM, Debbie Lapeyre wrote:
>>>>>>
>>>>>> Dear
Michael--
>>>>>>
>>>>>> Ouch! No you are not overlooking anything obvious. The
problem
>>>>>> is that, although you ask for "semantically reasonable", you
>>>>>> really want presentation markup. JATS does not do presentation,
>>>>>>
by design or very well.
>>>>>>
>>>>>>   - My first thought is a table, which
this certainly looks like
>>>>>>     to me. But I do see your problem.
>>>>>>
>>>>>>   - If it has to present EXACTLY this way, another obvious
>>>>>>
(but less than perfect) choice is <preformat>. That would
>>>>>>      - force
this into a monofont (sorry about that)
>>>>>>      - preserve all your
alignments and whitespace
>>>>>>      - let you include the italics, bold, and
stuff.
>>>>>>
>>>>>>   - Another possibility (not in NLM 3.0, but in the brand
new
>>>>>>     JATS 1.1d1) is using <ruby>, which has a base (<rb>) and a
>>>>>>     ruby text annotation (rt) traditionally displayed atop the
>>>>>>
base (rt), or inside parenthesis after the base for browsers
>>>>>>     that
cannot handle Ruby. Ruby is part of HTML5, as well as
>>>>>>     part of JATS.
Ruby markup is intended for textual annotation,
>>>>>>     and might fit this
case very well.
>>>>>>
>>>>>> But I've got to tell you, I found this example
incredibly hard to
>>>>>> human parse and be sure what went with what and why
were these 2
>>>>>> clusters parallel and that one all alone? When the top
line and the
>>>>>> bottom line both had values, I was fine, but sometimes...
Whatever
>>>>>> you decide, a few horizontal lines or just more white space
between
>>>>>> the lines and/or less between the line and its gloss, would
help
>>>>>> me to separate.
>>>>>>
>>>>>> --Debbie
>>>>>>
>>>>>>
>>>>>> On Nov
19, 2013, at 4:17 PM, Michael Boudreau
>>>>>> <mboudreau@xxxxxxxxxxxxxxxxxx>
wrote:
>>>>>>
>>>>>>> Greetings,
>>>>>>>
>>>>>>> Has anyone tackled the
problem of marking up textual illustrations
>>>>>>>that
>>>>>>> require
multiple points of vertical alignment--the sort of thing for
>>>>>>> which
>>>>>>> you1d set tab stops on a typewriter or word processor?
>>>>>>>
>>>>>>>
I1m working on a linguistics journal that has lots of glossed text
>>>>>>>
illustrations that are typeset like the items labeled (3) and (4) on
>>>>>>>this
>>>>>>> page image:
>>>>>>>
>>>>>>>
http://mss.uchicago.edu:81/mrb/linguistics.png
>>>>>>>
>>>>>>> We1re using the
NLM Journal Publishing 3.0 DTD, and I1m at a loss for
>>>>>>>a
>>>>>>> markup
solution that seems semantically reasonable and illustrates
>>>>>>>the
>>>>>>>
relationships between the chunks of text that the typesetting makes
>>>>>>>
obvious. I1ve considered table markup, but I don1t want to break a
>>>>>>>single
>>>>>>> sentence or other unit of meaning into multiple table
cells across a
>>>>>>>row.
>>>>>>> When I consider how our online host would
convert XML into HTML, I
>>>>>>>see
>>>>>>> only the same bad option.
>>>>>>>
>>>>>>> Am I overlooking something obvious?
>>>>>>>
>>>>>>> --
>>>>>>> Michael
R. Boudreau
>>>>>>> Electronic Publishing Technology Manager
>>>>>>> The
University of Chicago Press
>>>>>>> 1427 E. 60th Street
>>>>>>> Chicago, IL
60637
>>>>>>> (773) 753-3298
>>>>>>> www.journals.uchicago.edu
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
================================================================
>>>>>>
Deborah A Lapeyre              mailto:dalapeyre@xxxxxxxxxxxxxxxx
>>>>>>
Mulberry Technologies, Inc.      http://www.mulberrytech.com
>>>>>> 17 West
Jefferson Street         Phone: 301-315-9631 (USA)
>>>>>> Suite 207
Fax:   301-315-8385
>>>>>> Rockville, MD 20850
>>>>>>
----------------------------------------------------------------
>>>>>>
Mulberry Technologies: Consultancy for XML, XSLT, and Schematron
>>>>>>
================================================================
>>>>
>>>
>>
>>
>>
--~------------------------------------------------------------------
>>
JATS-List info and archive:  http://www.mulberrytech.com/JATS/JATS-List/
>> To
unsubscribe, go to: http://lists.mulberrytech.com/jats-list/
>> or e-mail:
<mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
>> --~--
>>
--~------------------------------------------------------------------
JATS-List info and archive:  http://www.mulberrytech.com/JATS/JATS-List/
To
unsubscribe, go to: http://lists.mulberrytech.com/jats-list/
or e-mail:
<mailto:jats-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
--~--

Current Thread