Re: [jats-list] Markup for linguistics (glossed text)

Subject: Re: [jats-list] Markup for linguistics (glossed text)
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Fri, 22 Nov 2013 14:56:40 -0500
Hi,

My nominations for alternatives:

(1) If there are a lot of these, and real benefit to be gained, then
design and use a little markup language for them. Then, format as you
like, probably via tables.

Disadvantage: time and expertise required. Dependence on specialists'
knowhow. (But that could be an advantage.)

(2) Custom-designed tables, validated via Schematron. JATS provides
@content-type
Just as much work, and you'd be doing all the same work as (1), but
they could be made to validate as JATS without extending it.

Advantage: relatively quick and dirty to get something started.
Disadvantage: the XML would be relatively hard to maintain compared to
(1). Also, this is schema design without a schema, so relatively
fragile and not scalable to complexity.

(Such a table could also be used to represent (1) in JATS when
interfacing with JATS-based systems.)

(3) SVG. Similar disadvantages, many advantages of its own. They could
be very pretty. :-)

It sounds like graphics made from SVGs might be the preferred choice
of your vendor (and I don't blame them). But as Debbie points out,
they're not searchable. (If the SVGs were available they'd be sort of
searchable.)

What my choice would be would depend on my goals, long-term and
short-term resources, and the frequency with which it occurs or number
of them. Having a finite number of these things (i.e. I'd never expect
to see more of these than I already have) or having them very
infrequently would argue for (2) or (3). The more of these there are
and the more interesting/important the semantics they could expose,
the more I'd do (1).

Designing and specifying a well-controlled, clean descriptive format
(1) would also be really fun. (2) and (3) are also natural spin-offs
for (1), not exclusive of it -- although you could also skip to them
directly (and specialists in CSS and SVG might prefer to do so).

Cheers, Wendell





Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


On Thu, Nov 21, 2013 at 5:01 PM, Michael Boudreau
<mboudreau@xxxxxxxxxxxxxxxxxx> wrote:
> For what it's worth, our hosting platform informs me that the only way to
> get these images to display at a consistent size is to submit the
> <graphic> element as a child of <disp-formula>. They were not sympathetic
> to my pointing out that these are not math.
>
> --
> Michael R. Boudreau
> Electronic Publishing Technology Manager
> The University of Chicago Press
> 1427 E. 60th Street
> Chicago, IL 60637
> (773) 753-3298
> www.journals.uchicago.edu
>
>
>
>
>
> On 11/20/13, 10:56 AM, "Michael Boudreau" <mboudreau@xxxxxxxxxxxxxxxxxx>
> wrote:
>
>>Thanks, everyone, for these comments. I should have mentioned that we're
>>currently using graphics, like so (highly simplified):
>>
>>   <p>Some text precedes an example:</p>
>>   <p><graphic href="example1.tiff"/></p>
>>   <p>And the text continues.</p>
>>
>>This can be converted by our host to a readable HTML presentation. The
>>down-side is that the content of the graphic is not searchable by the
>>user's browser (though the site's search engine can build its index from
>>the PDF version, so all is not lost), and the graphic's visual quality is
>>relatively low, particularly on mobile devices.
>>
>>To answer Nikos's question, I don't have a current project that requires a
>>particular type of markup for such examples, but the examples in their
>>context just don't strike me as "tabular"--but I'm not a linguist and
>>would defer to the journal editors if they deemed table markup
>>appropriate. I think <ruby> is closer to the mark; I'd have to do
>>extensive testing to see if it could handle examples with multiple layers
>>of glossing on the base text (sometimes there are 2 or 3 or more). (I
>>tremble to think what it would take to train our typesetting vendors to
>>apply either <table> or <ruby> markup to these examples.)
>>
>>I hadn't thought of <array>, which actually might help solve a processing
>>problem on our vendor's side even while still using <graphic>.
>>
>>
>>--
>>Michael R. Boudreau
>>Electronic Publishing Technology Manager
>>The University of Chicago Press
>>1427 E. 60th Street
>>Chicago, IL 60637
>>(773) 753-3298
>>www.journals.uchicago.edu
>>
>>
>>
>>
>>
>>On 11/20/13, 9:14 AM, "Alexander Schwarzman" <aschwarzman@xxxxxxxxx>
>>wrote:
>>
>>>Or, perhaps, use <array>, with either <graphic>, as Nikos suggested,
>>>or with <tbody> inside...
>>>
>>>--Sasha
>>>
>>>Alexander ('Sasha') Schwarzman, Content Technology Architect
>>>phone: +1.202.416.1979 | e-mail: aschwarzman@xxxxxxx
>>>
>>>The Optical Society (OSA)
>>>2010 Massachusetts Ave., NW
>>>Washington, DC 20036 USA
>>>www.osa.org
>>>
>>>
>>>On Wed, Nov 20, 2013 at 5:01 AM, Nikos Markantonatos <nikos@xxxxxxxxxx>
>>>wrote:
>>>> Hi Michael,
>>>>
>>>> The question that arises of course out of the "semantically reasonable"
>>>> encoding of such difficult pieces of text is why you need it. Are you
>>>> planning to draw some logic across different types of such linguistic
>>>> representations? In that case, JATS alone will hardly offer you a
>>>>solution.
>>>> JATS often resorts to other known standards for the representation of
>>>> "tough" textual pieces, such as mathematical equations (MathML) and
>>>>tables
>>>> (XHTML, OASIS). If there was a corresponding XML encoding standard for
>>>> linguistic representations, one could make the case for embedding it
>>>>into
>>>> JATS.
>>>>
>>>> Otherwise, you are left to choose between the encoding options
>>>>suggested by
>>>> Debbie, or to capture it as an image (my favorite option), or even
>>>>attempt
>>>> to represent it in TeX/LaTeX or MathML.
>>>>
>>>> Best regards,
>>>> Nikos Markantonatos
>>>> Atypon
>>>>
>>>>
>>>> On 11/19/2013 11:47 PM, Debbie Lapeyre wrote:
>>>>>
>>>>> Dear Michael--
>>>>>
>>>>> Ouch! No you are not overlooking anything obvious. The problem
>>>>> is that, although you ask for "semantically reasonable", you
>>>>> really want presentation markup. JATS does not do presentation,
>>>>> by design or very well.
>>>>>
>>>>>   - My first thought is a table, which this certainly looks like
>>>>>     to me. But I do see your problem.
>>>>>
>>>>>   - If it has to present EXACTLY this way, another obvious
>>>>>     (but less than perfect) choice is <preformat>. That would
>>>>>      - force this into a monofont (sorry about that)
>>>>>      - preserve all your alignments and whitespace
>>>>>      - let you include the italics, bold, and stuff.
>>>>>
>>>>>   - Another possibility (not in NLM 3.0, but in the brand new
>>>>>     JATS 1.1d1) is using <ruby>, which has a base (<rb>) and a
>>>>>     ruby text annotation (rt) traditionally displayed atop the
>>>>>     base (rt), or inside parenthesis after the base for browsers
>>>>>     that cannot handle Ruby. Ruby is part of HTML5, as well as
>>>>>     part of JATS. Ruby markup is intended for textual annotation,
>>>>>     and might fit this case very well.
>>>>>
>>>>> But I've got to tell you, I found this example incredibly hard to
>>>>> human parse and be sure what went with what and why were these 2
>>>>> clusters parallel and that one all alone? When the top line and the
>>>>> bottom line both had values, I was fine, but sometimes... Whatever
>>>>> you decide, a few horizontal lines or just more white space between
>>>>> the lines and/or less between the line and its gloss, would help
>>>>> me to separate.
>>>>>
>>>>> --Debbie
>>>>>
>>>>>
>>>>> On Nov 19, 2013, at 4:17 PM, Michael Boudreau
>>>>> <mboudreau@xxxxxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> Has anyone tackled the problem of marking up textual illustrations
>>>>>>that
>>>>>> require multiple points of vertical alignment--the sort of thing for
>>>>>> which
>>>>>> youDd set tab stops on a typewriter or word processor?
>>>>>>
>>>>>> IDm working on a linguistics journal that has lots of glossed text
>>>>>> illustrations that are typeset like the items labeled (3) and (4) on
>>>>>>this
>>>>>> page image:
>>>>>>
>>>>>>    http://mss.uchicago.edu:81/mrb/linguistics.png
>>>>>>
>>>>>> WeDre using the NLM Journal Publishing 3.0 DTD, and IDm at a loss
for
>>>>>>a
>>>>>> markup solution that seems semantically reasonable and illustrates
>>>>>>the
>>>>>> relationships between the chunks of text that the typesetting makes
>>>>>> obvious. IDve considered table markup, but I donDt want to break a
>>>>>>single
>>>>>> sentence or other unit of meaning into multiple table cells across a
>>>>>>row.
>>>>>> When I consider how our online host would convert XML into HTML, I
>>>>>>see
>>>>>> only the same bad option.
>>>>>>
>>>>>> Am I overlooking something obvious?
>>>>>>
>>>>>> --
>>>>>> Michael R. Boudreau
>>>>>> Electronic Publishing Technology Manager
>>>>>> The University of Chicago Press
>>>>>> 1427 E. 60th Street
>>>>>> Chicago, IL 60637
>>>>>> (773) 753-3298
>>>>>> www.journals.uchicago.edu
>>>>>>
>>>>>
>>>>>
>>>>> ================================================================
>>>>> Deborah A Lapeyre              mailto:dalapeyre@xxxxxxxxxxxxxxxx
>>>>> Mulberry Technologies, Inc.      http://www.mulberrytech.com
>>>>> 17 West Jefferson Street         Phone: 301-315-9631 (USA)
>>>>> Suite 207                        Fax:   301-315-8385
>>>>> Rockville, MD 20850
>>>>> ----------------------------------------------------------------
>>>>> Mulberry Technologies: Consultancy for XML, XSLT, and Schematron
>>>>> ================================================================

Current Thread