Re: [jats-list] Markup for linguistics (glossed text)

Subject: Re: [jats-list] Markup for linguistics (glossed text)
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Fri, 22 Nov 2013 15:20:33 -0500
Hi again,

Also, I'd prefer plain-old tables (however ornate) to 'ruby' following
the "Principle of Least Surprise".

Cheers, Wendell

Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


On Fri, Nov 22, 2013 at 2:56 PM, Wendell Piez <wapiez@xxxxxxxxxxxxxxx> wrote:
> Hi,
>
> My nominations for alternatives:
>
> (1) If there are a lot of these, and real benefit to be gained, then
> design and use a little markup language for them. Then, format as you
> like, probably via tables.
>
> Disadvantage: time and expertise required. Dependence on specialists'
> knowhow. (But that could be an advantage.)
>
> (2) Custom-designed tables, validated via Schematron. JATS provides
> @content-type
> Just as much work, and you'd be doing all the same work as (1), but
> they could be made to validate as JATS without extending it.
>
> Advantage: relatively quick and dirty to get something started.
> Disadvantage: the XML would be relatively hard to maintain compared to
> (1). Also, this is schema design without a schema, so relatively
> fragile and not scalable to complexity.
>
> (Such a table could also be used to represent (1) in JATS when
> interfacing with JATS-based systems.)
>
> (3) SVG. Similar disadvantages, many advantages of its own. They could
> be very pretty. :-)
>
> It sounds like graphics made from SVGs might be the preferred choice
> of your vendor (and I don't blame them). But as Debbie points out,
> they're not searchable. (If the SVGs were available they'd be sort of
> searchable.)
>
> What my choice would be would depend on my goals, long-term and
> short-term resources, and the frequency with which it occurs or number
> of them. Having a finite number of these things (i.e. I'd never expect
> to see more of these than I already have) or having them very
> infrequently would argue for (2) or (3). The more of these there are
> and the more interesting/important the semantics they could expose,
> the more I'd do (1).
>
> Designing and specifying a well-controlled, clean descriptive format
> (1) would also be really fun. (2) and (3) are also natural spin-offs
> for (1), not exclusive of it -- although you could also skip to them
> directly (and specialists in CSS and SVG might prefer to do so).
>
> Cheers, Wendell
>
>
>
>
>
> Wendell Piez | http://www.wendellpiez.com
> XML | XSLT | electronic publishing
> Eat Your Vegetables
> _____oo_________o_o___ooooo____ooooooo_^
>
>
> On Thu, Nov 21, 2013 at 5:01 PM, Michael Boudreau
> <mboudreau@xxxxxxxxxxxxxxxxxx> wrote:
>> For what it's worth, our hosting platform informs me that the only way to
>> get these images to display at a consistent size is to submit the
>> <graphic> element as a child of <disp-formula>. They were not sympathetic
>> to my pointing out that these are not math.
>>
>> --
>> Michael R. Boudreau
>> Electronic Publishing Technology Manager
>> The University of Chicago Press
>> 1427 E. 60th Street
>> Chicago, IL 60637
>> (773) 753-3298
>> www.journals.uchicago.edu
>>
>>
>>
>>
>>
>> On 11/20/13, 10:56 AM, "Michael Boudreau" <mboudreau@xxxxxxxxxxxxxxxxxx>
>> wrote:
>>
>>>Thanks, everyone, for these comments. I should have mentioned that we're
>>>currently using graphics, like so (highly simplified):
>>>
>>>   <p>Some text precedes an example:</p>
>>>   <p><graphic href="example1.tiff"/></p>
>>>   <p>And the text continues.</p>
>>>
>>>This can be converted by our host to a readable HTML presentation. The
>>>down-side is that the content of the graphic is not searchable by the
>>>user's browser (though the site's search engine can build its index from
>>>the PDF version, so all is not lost), and the graphic's visual quality is
>>>relatively low, particularly on mobile devices.
>>>
>>>To answer Nikos's question, I don't have a current project that requires a
>>>particular type of markup for such examples, but the examples in their
>>>context just don't strike me as "tabular"--but I'm not a linguist and
>>>would defer to the journal editors if they deemed table markup
>>>appropriate. I think <ruby> is closer to the mark; I'd have to do
>>>extensive testing to see if it could handle examples with multiple layers
>>>of glossing on the base text (sometimes there are 2 or 3 or more). (I
>>>tremble to think what it would take to train our typesetting vendors to
>>>apply either <table> or <ruby> markup to these examples.)
>>>
>>>I hadn't thought of <array>, which actually might help solve a processing
>>>problem on our vendor's side even while still using <graphic>.
>>>
>>>
>>>--
>>>Michael R. Boudreau
>>>Electronic Publishing Technology Manager
>>>The University of Chicago Press
>>>1427 E. 60th Street
>>>Chicago, IL 60637
>>>(773) 753-3298
>>>www.journals.uchicago.edu
>>>
>>>
>>>
>>>
>>>
>>>On 11/20/13, 9:14 AM, "Alexander Schwarzman" <aschwarzman@xxxxxxxxx>
>>>wrote:
>>>
>>>>Or, perhaps, use <array>, with either <graphic>, as Nikos suggested,
>>>>or with <tbody> inside...
>>>>
>>>>--Sasha
>>>>
>>>>Alexander ('Sasha') Schwarzman, Content Technology Architect
>>>>phone: +1.202.416.1979 | e-mail: aschwarzman@xxxxxxx
>>>>
>>>>The Optical Society (OSA)
>>>>2010 Massachusetts Ave., NW
>>>>Washington, DC 20036 USA
>>>>www.osa.org
>>>>
>>>>
>>>>On Wed, Nov 20, 2013 at 5:01 AM, Nikos Markantonatos <nikos@xxxxxxxxxx>
>>>>wrote:
>>>>> Hi Michael,
>>>>>
>>>>> The question that arises of course out of the "semantically reasonable"
>>>>> encoding of such difficult pieces of text is why you need it. Are you
>>>>> planning to draw some logic across different types of such linguistic
>>>>> representations? In that case, JATS alone will hardly offer you a
>>>>>solution.
>>>>> JATS often resorts to other known standards for the representation of
>>>>> "tough" textual pieces, such as mathematical equations (MathML) and
>>>>>tables
>>>>> (XHTML, OASIS). If there was a corresponding XML encoding standard for
>>>>> linguistic representations, one could make the case for embedding it
>>>>>into
>>>>> JATS.
>>>>>
>>>>> Otherwise, you are left to choose between the encoding options
>>>>>suggested by
>>>>> Debbie, or to capture it as an image (my favorite option), or even
>>>>>attempt
>>>>> to represent it in TeX/LaTeX or MathML.
>>>>>
>>>>> Best regards,
>>>>> Nikos Markantonatos
>>>>> Atypon
>>>>>
>>>>>
>>>>> On 11/19/2013 11:47 PM, Debbie Lapeyre wrote:
>>>>>>
>>>>>> Dear Michael--
>>>>>>
>>>>>> Ouch! No you are not overlooking anything obvious. The problem
>>>>>> is that, although you ask for "semantically reasonable", you
>>>>>> really want presentation markup. JATS does not do presentation,
>>>>>> by design or very well.
>>>>>>
>>>>>>   - My first thought is a table, which this certainly looks like
>>>>>>     to me. But I do see your problem.
>>>>>>
>>>>>>   - If it has to present EXACTLY this way, another obvious
>>>>>>     (but less than perfect) choice is <preformat>. That would
>>>>>>      - force this into a monofont (sorry about that)
>>>>>>      - preserve all your alignments and whitespace
>>>>>>      - let you include the italics, bold, and stuff.
>>>>>>
>>>>>>   - Another possibility (not in NLM 3.0, but in the brand new
>>>>>>     JATS 1.1d1) is using <ruby>, which has a base (<rb>) and a
>>>>>>     ruby text annotation (rt) traditionally displayed atop the
>>>>>>     base (rt), or inside parenthesis after the base for browsers
>>>>>>     that cannot handle Ruby. Ruby is part of HTML5, as well as
>>>>>>     part of JATS. Ruby markup is intended for textual annotation,
>>>>>>     and might fit this case very well.
>>>>>>
>>>>>> But I've got to tell you, I found this example incredibly hard to
>>>>>> human parse and be sure what went with what and why were these 2
>>>>>> clusters parallel and that one all alone? When the top line and the
>>>>>> bottom line both had values, I was fine, but sometimes... Whatever
>>>>>> you decide, a few horizontal lines or just more white space between
>>>>>> the lines and/or less between the line and its gloss, would help
>>>>>> me to separate.
>>>>>>
>>>>>> --Debbie
>>>>>>
>>>>>>
>>>>>> On Nov 19, 2013, at 4:17 PM, Michael Boudreau
>>>>>> <mboudreau@xxxxxxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>>> Greetings,
>>>>>>>
>>>>>>> Has anyone tackled the problem of marking up textual illustrations
>>>>>>>that
>>>>>>> require multiple points of vertical alignment--the sort of thing for
>>>>>>> which
>>>>>>> youDd set tab stops on a typewriter or word processor?
>>>>>>>
>>>>>>> IDm working on a linguistics journal that has lots of glossed text
>>>>>>> illustrations that are typeset like the items labeled (3) and (4) on
>>>>>>>this
>>>>>>> page image:
>>>>>>>
>>>>>>>    http://mss.uchicago.edu:81/mrb/linguistics.png
>>>>>>>
>>>>>>> WeDre using the NLM Journal Publishing 3.0 DTD, and IDm at a loss
for
>>>>>>>a
>>>>>>> markup solution that seems semantically reasonable and illustrates
>>>>>>>the
>>>>>>> relationships between the chunks of text that the typesetting makes
>>>>>>> obvious. IDve considered table markup, but I donDt want to break a
>>>>>>>single
>>>>>>> sentence or other unit of meaning into multiple table cells across a
>>>>>>>row.
>>>>>>> When I consider how our online host would convert XML into HTML, I
>>>>>>>see
>>>>>>> only the same bad option.
>>>>>>>
>>>>>>> Am I overlooking something obvious?
>>>>>>>
>>>>>>> --
>>>>>>> Michael R. Boudreau
>>>>>>> Electronic Publishing Technology Manager
>>>>>>> The University of Chicago Press
>>>>>>> 1427 E. 60th Street
>>>>>>> Chicago, IL 60637
>>>>>>> (773) 753-3298
>>>>>>> www.journals.uchicago.edu
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ================================================================
>>>>>> Deborah A Lapeyre              mailto:dalapeyre@xxxxxxxxxxxxxxxx
>>>>>> Mulberry Technologies, Inc.      http://www.mulberrytech.com
>>>>>> 17 West Jefferson Street         Phone: 301-315-9631 (USA)
>>>>>> Suite 207                        Fax:   301-315-8385
>>>>>> Rockville, MD 20850
>>>>>> ----------------------------------------------------------------
>>>>>> Mulberry Technologies: Consultancy for XML, XSLT, and Schematron
>>>>>> ================================================================

Current Thread