Re: [jats-list] Markup for linguistics (glossed text)

Subject: Re: [jats-list] Markup for linguistics (glossed text)
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Fri, 22 Nov 2013 15:44:03 -0500
Hi again,

Sorry I take it back: since the line breaks in the samples appear to
arbitrary, 'ruby' might be a better choice after all (although this is
also a "creative" use of Ruby, which has generally been for
phonological transcription AFAIK) than tables. Still not as fun as
your own markup.

Cheers, Wendell

Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


On Fri, Nov 22, 2013 at 3:20 PM, Wendell Piez <wapiez@xxxxxxxxxxxxxxx> wrote:
> Hi again,
>
> Also, I'd prefer plain-old tables (however ornate) to 'ruby' following
> the "Principle of Least Surprise".
>
> Cheers, Wendell
>
> Wendell Piez | http://www.wendellpiez.com
> XML | XSLT | electronic publishing
> Eat Your Vegetables
> _____oo_________o_o___ooooo____ooooooo_^
>
>
> On Fri, Nov 22, 2013 at 2:56 PM, Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
wrote:
>> Hi,
>>
>> My nominations for alternatives:
>>
>> (1) If there are a lot of these, and real benefit to be gained, then
>> design and use a little markup language for them. Then, format as you
>> like, probably via tables.
>>
>> Disadvantage: time and expertise required. Dependence on specialists'
>> knowhow. (But that could be an advantage.)
>>
>> (2) Custom-designed tables, validated via Schematron. JATS provides
>> @content-type
>> Just as much work, and you'd be doing all the same work as (1), but
>> they could be made to validate as JATS without extending it.
>>
>> Advantage: relatively quick and dirty to get something started.
>> Disadvantage: the XML would be relatively hard to maintain compared to
>> (1). Also, this is schema design without a schema, so relatively
>> fragile and not scalable to complexity.
>>
>> (Such a table could also be used to represent (1) in JATS when
>> interfacing with JATS-based systems.)
>>
>> (3) SVG. Similar disadvantages, many advantages of its own. They could
>> be very pretty. :-)
>>
>> It sounds like graphics made from SVGs might be the preferred choice
>> of your vendor (and I don't blame them). But as Debbie points out,
>> they're not searchable. (If the SVGs were available they'd be sort of
>> searchable.)
>>
>> What my choice would be would depend on my goals, long-term and
>> short-term resources, and the frequency with which it occurs or number
>> of them. Having a finite number of these things (i.e. I'd never expect
>> to see more of these than I already have) or having them very
>> infrequently would argue for (2) or (3). The more of these there are
>> and the more interesting/important the semantics they could expose,
>> the more I'd do (1).
>>
>> Designing and specifying a well-controlled, clean descriptive format
>> (1) would also be really fun. (2) and (3) are also natural spin-offs
>> for (1), not exclusive of it -- although you could also skip to them
>> directly (and specialists in CSS and SVG might prefer to do so).
>>
>> Cheers, Wendell
>>
>>
>>
>>
>>
>> Wendell Piez | http://www.wendellpiez.com
>> XML | XSLT | electronic publishing
>> Eat Your Vegetables
>> _____oo_________o_o___ooooo____ooooooo_^
>>
>>
>> On Thu, Nov 21, 2013 at 5:01 PM, Michael Boudreau
>> <mboudreau@xxxxxxxxxxxxxxxxxx> wrote:
>>> For what it's worth, our hosting platform informs me that the only way to
>>> get these images to display at a consistent size is to submit the
>>> <graphic> element as a child of <disp-formula>. They were not sympathetic
>>> to my pointing out that these are not math.
>>>
>>> --
>>> Michael R. Boudreau
>>> Electronic Publishing Technology Manager
>>> The University of Chicago Press
>>> 1427 E. 60th Street
>>> Chicago, IL 60637
>>> (773) 753-3298
>>> www.journals.uchicago.edu
>>>
>>>
>>>
>>>
>>>
>>> On 11/20/13, 10:56 AM, "Michael Boudreau" <mboudreau@xxxxxxxxxxxxxxxxxx>
>>> wrote:
>>>
>>>>Thanks, everyone, for these comments. I should have mentioned that we're
>>>>currently using graphics, like so (highly simplified):
>>>>
>>>>   <p>Some text precedes an example:</p>
>>>>   <p><graphic href="example1.tiff"/></p>
>>>>   <p>And the text continues.</p>
>>>>
>>>>This can be converted by our host to a readable HTML presentation. The
>>>>down-side is that the content of the graphic is not searchable by the
>>>>user's browser (though the site's search engine can build its index from
>>>>the PDF version, so all is not lost), and the graphic's visual quality is
>>>>relatively low, particularly on mobile devices.
>>>>
>>>>To answer Nikos's question, I don't have a current project that requires
a
>>>>particular type of markup for such examples, but the examples in their
>>>>context just don't strike me as "tabular"--but I'm not a linguist and
>>>>would defer to the journal editors if they deemed table markup
>>>>appropriate. I think <ruby> is closer to the mark; I'd have to do
>>>>extensive testing to see if it could handle examples with multiple layers
>>>>of glossing on the base text (sometimes there are 2 or 3 or more). (I
>>>>tremble to think what it would take to train our typesetting vendors to
>>>>apply either <table> or <ruby> markup to these examples.)
>>>>
>>>>I hadn't thought of <array>, which actually might help solve a processing
>>>>problem on our vendor's side even while still using <graphic>.
>>>>
>>>>
>>>>--
>>>>Michael R. Boudreau
>>>>Electronic Publishing Technology Manager
>>>>The University of Chicago Press
>>>>1427 E. 60th Street
>>>>Chicago, IL 60637
>>>>(773) 753-3298
>>>>www.journals.uchicago.edu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On 11/20/13, 9:14 AM, "Alexander Schwarzman" <aschwarzman@xxxxxxxxx>
>>>>wrote:
>>>>
>>>>>Or, perhaps, use <array>, with either <graphic>, as Nikos suggested,
>>>>>or with <tbody> inside...
>>>>>
>>>>>--Sasha
>>>>>
>>>>>Alexander ('Sasha') Schwarzman, Content Technology Architect
>>>>>phone: +1.202.416.1979 | e-mail: aschwarzman@xxxxxxx
>>>>>
>>>>>The Optical Society (OSA)
>>>>>2010 Massachusetts Ave., NW
>>>>>Washington, DC 20036 USA
>>>>>www.osa.org
>>>>>
>>>>>
>>>>>On Wed, Nov 20, 2013 at 5:01 AM, Nikos Markantonatos <nikos@xxxxxxxxxx>
>>>>>wrote:
>>>>>> Hi Michael,
>>>>>>
>>>>>> The question that arises of course out of the "semantically
reasonable"
>>>>>> encoding of such difficult pieces of text is why you need it. Are you
>>>>>> planning to draw some logic across different types of such linguistic
>>>>>> representations? In that case, JATS alone will hardly offer you a
>>>>>>solution.
>>>>>> JATS often resorts to other known standards for the representation of
>>>>>> "tough" textual pieces, such as mathematical equations (MathML) and
>>>>>>tables
>>>>>> (XHTML, OASIS). If there was a corresponding XML encoding standard for
>>>>>> linguistic representations, one could make the case for embedding it
>>>>>>into
>>>>>> JATS.
>>>>>>
>>>>>> Otherwise, you are left to choose between the encoding options
>>>>>>suggested by
>>>>>> Debbie, or to capture it as an image (my favorite option), or even
>>>>>>attempt
>>>>>> to represent it in TeX/LaTeX or MathML.
>>>>>>
>>>>>> Best regards,
>>>>>> Nikos Markantonatos
>>>>>> Atypon
>>>>>>
>>>>>>
>>>>>> On 11/19/2013 11:47 PM, Debbie Lapeyre wrote:
>>>>>>>
>>>>>>> Dear Michael--
>>>>>>>
>>>>>>> Ouch! No you are not overlooking anything obvious. The problem
>>>>>>> is that, although you ask for "semantically reasonable", you
>>>>>>> really want presentation markup. JATS does not do presentation,
>>>>>>> by design or very well.
>>>>>>>
>>>>>>>   - My first thought is a table, which this certainly looks like
>>>>>>>     to me. But I do see your problem.
>>>>>>>
>>>>>>>   - If it has to present EXACTLY this way, another obvious
>>>>>>>     (but less than perfect) choice is <preformat>. That would
>>>>>>>      - force this into a monofont (sorry about that)
>>>>>>>      - preserve all your alignments and whitespace
>>>>>>>      - let you include the italics, bold, and stuff.
>>>>>>>
>>>>>>>   - Another possibility (not in NLM 3.0, but in the brand new
>>>>>>>     JATS 1.1d1) is using <ruby>, which has a base (<rb>) and a
>>>>>>>     ruby text annotation (rt) traditionally displayed atop the
>>>>>>>     base (rt), or inside parenthesis after the base for browsers
>>>>>>>     that cannot handle Ruby. Ruby is part of HTML5, as well as
>>>>>>>     part of JATS. Ruby markup is intended for textual annotation,
>>>>>>>     and might fit this case very well.
>>>>>>>
>>>>>>> But I've got to tell you, I found this example incredibly hard to
>>>>>>> human parse and be sure what went with what and why were these 2
>>>>>>> clusters parallel and that one all alone? When the top line and the
>>>>>>> bottom line both had values, I was fine, but sometimes... Whatever
>>>>>>> you decide, a few horizontal lines or just more white space between
>>>>>>> the lines and/or less between the line and its gloss, would help
>>>>>>> me to separate.
>>>>>>>
>>>>>>> --Debbie
>>>>>>>
>>>>>>>
>>>>>>> On Nov 19, 2013, at 4:17 PM, Michael Boudreau
>>>>>>> <mboudreau@xxxxxxxxxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> Has anyone tackled the problem of marking up textual illustrations
>>>>>>>>that
>>>>>>>> require multiple points of vertical alignment--the sort of thing for
>>>>>>>> which
>>>>>>>> youDd set tab stops on a typewriter or word processor?
>>>>>>>>
>>>>>>>> IDm working on a linguistics journal that has lots of glossed text
>>>>>>>> illustrations that are typeset like the items labeled (3) and (4) on
>>>>>>>>this
>>>>>>>> page image:
>>>>>>>>
>>>>>>>>    http://mss.uchicago.edu:81/mrb/linguistics.png
>>>>>>>>
>>>>>>>> WeDre using the NLM Journal Publishing 3.0 DTD, and IDm at a loss
for
>>>>>>>>a
>>>>>>>> markup solution that seems semantically reasonable and illustrates
>>>>>>>>the
>>>>>>>> relationships between the chunks of text that the typesetting makes
>>>>>>>> obvious. IDve considered table markup, but I donDt want to break a
>>>>>>>>single
>>>>>>>> sentence or other unit of meaning into multiple table cells across a
>>>>>>>>row.
>>>>>>>> When I consider how our online host would convert XML into HTML, I
>>>>>>>>see
>>>>>>>> only the same bad option.
>>>>>>>>
>>>>>>>> Am I overlooking something obvious?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Michael R. Boudreau
>>>>>>>> Electronic Publishing Technology Manager
>>>>>>>> The University of Chicago Press
>>>>>>>> 1427 E. 60th Street
>>>>>>>> Chicago, IL 60637
>>>>>>>> (773) 753-3298
>>>>>>>> www.journals.uchicago.edu
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ================================================================
>>>>>>> Deborah A Lapeyre              mailto:dalapeyre@xxxxxxxxxxxxxxxx
>>>>>>> Mulberry Technologies, Inc.      http://www.mulberrytech.com
>>>>>>> 17 West Jefferson Street         Phone: 301-315-9631 (USA)
>>>>>>> Suite 207                        Fax:   301-315-8385
>>>>>>> Rockville, MD 20850
>>>>>>> ----------------------------------------------------------------
>>>>>>> Mulberry Technologies: Consultancy for XML, XSLT, and Schematron
>>>>>>> ================================================================

Current Thread