Re: [jats-list] Markup for linguistics (glossed text)

Subject: Re: [jats-list] Markup for linguistics (glossed text)
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Sat, 23 Nov 2013 00:48:37 +0100
How would you do that in TEI?

Maybe there is no canonical way either, but at least there should be one or more recommended ways. Encoding stuff like that is TEIbs core business, isnbt it?

Do you encode the aligned segments in separate paragraphs, with links between the corresponding segments [1]? This could be either links from the base segment to its annotation or the other way round, or in a linkGrp [2]. Plus, add some semantic information about what is the base and what is the annotation.

Ibm not sure, Ibm a dabbler in TEI as much as in JATS. But if the JATS family of markup dialects [3] used this kind of correspondence linking, how would it translate to its own vocabulary?

Maybe something along these lines:

<p><named-content content-type="base" id="id1"><italic>Siu-ti</italic></named-content>
<named-content content-type="base" id="id2"><italic><bold>i</bold>-najyen-b&</italic></named-content> b&</p>
<p><named-content content-type="annot" rid="id1"><styled-content


style-type="small-caps"><italic>syu</italic>-comp</styled-content></named-content>

<named-content content-type="annot" rid="id2"><bold>2O</bold>-b&</named-content></p>

Having read Chris Maloneybs recent message on this topic, I agree that there shouldnbt probably be anything tabular in the markup.

Whether to use ruby or named-content is more a matter of taste then. Except when you have multiple levels of annotation. Then named-content, id, and rid are more versatile.

Gerrit

[1] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SACS
[2] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-linkGrp.html
[3] Linguists will probably dispute that NLM, BITS, JATS are *dialects*, and insist on theybre something else within their onotology.


On 22.11.2013 21:44, Wendell Piez wrote:
Hi again,

Sorry I take it back: since the line breaks in the samples appear to
arbitrary, 'ruby' might be a better choice after all (although this is
also a "creative" use of Ruby, which has generally been for
phonological transcription AFAIK) than tables. Still not as fun as
your own markup.

Cheers, Wendell

Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


On Fri, Nov 22, 2013 at 3:20 PM, Wendell Piez <wapiez@xxxxxxxxxxxxxxx> wrote:
Hi again,

Also, I'd prefer plain-old tables (however ornate) to 'ruby' following
the "Principle of Least Surprise".

Cheers, Wendell

Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


On Fri, Nov 22, 2013 at 2:56 PM, Wendell Piez <wapiez@xxxxxxxxxxxxxxx> wrote:
Hi,

My nominations for alternatives:

(1) If there are a lot of these, and real benefit to be gained, then
design and use a little markup language for them. Then, format as you
like, probably via tables.

Disadvantage: time and expertise required. Dependence on specialists'
knowhow. (But that could be an advantage.)

(2) Custom-designed tables, validated via Schematron. JATS provides
@content-type
Just as much work, and you'd be doing all the same work as (1), but
they could be made to validate as JATS without extending it.

Advantage: relatively quick and dirty to get something started.
Disadvantage: the XML would be relatively hard to maintain compared to
(1). Also, this is schema design without a schema, so relatively
fragile and not scalable to complexity.

(Such a table could also be used to represent (1) in JATS when
interfacing with JATS-based systems.)

(3) SVG. Similar disadvantages, many advantages of its own. They could
be very pretty. :-)

It sounds like graphics made from SVGs might be the preferred choice
of your vendor (and I don't blame them). But as Debbie points out,
they're not searchable. (If the SVGs were available they'd be sort of
searchable.)

What my choice would be would depend on my goals, long-term and
short-term resources, and the frequency with which it occurs or number
of them. Having a finite number of these things (i.e. I'd never expect
to see more of these than I already have) or having them very
infrequently would argue for (2) or (3). The more of these there are
and the more interesting/important the semantics they could expose,
the more I'd do (1).

Designing and specifying a well-controlled, clean descriptive format
(1) would also be really fun. (2) and (3) are also natural spin-offs
for (1), not exclusive of it -- although you could also skip to them
directly (and specialists in CSS and SVG might prefer to do so).

Cheers, Wendell





Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


On Thu, Nov 21, 2013 at 5:01 PM, Michael Boudreau <mboudreau@xxxxxxxxxxxxxxxxxx> wrote:
For what it's worth, our hosting platform informs me that the only way to
get these images to display at a consistent size is to submit the
<graphic> element as a child of <disp-formula>. They were not sympathetic
to my pointing out that these are not math.

--
Michael R. Boudreau
Electronic Publishing Technology Manager
The University of Chicago Press
1427 E. 60th Street
Chicago, IL 60637
(773) 753-3298
www.journals.uchicago.edu





On 11/20/13, 10:56 AM, "Michael Boudreau" <mboudreau@xxxxxxxxxxxxxxxxxx>
wrote:

Thanks, everyone, for these comments. I should have mentioned that we're
currently using graphics, like so (highly simplified):

   <p>Some text precedes an example:</p>
   <p><graphic href="example1.tiff"/></p>
   <p>And the text continues.</p>

This can be converted by our host to a readable HTML presentation. The
down-side is that the content of the graphic is not searchable by the
user's browser (though the site's search engine can build its index from
the PDF version, so all is not lost), and the graphic's visual quality is
relatively low, particularly on mobile devices.

To answer Nikos's question, I don't have a current project that requires a
particular type of markup for such examples, but the examples in their
context just don't strike me as "tabular"--but I'm not a linguist and
would defer to the journal editors if they deemed table markup
appropriate. I think <ruby> is closer to the mark; I'd have to do
extensive testing to see if it could handle examples with multiple layers
of glossing on the base text (sometimes there are 2 or 3 or more). (I
tremble to think what it would take to train our typesetting vendors to
apply either <table> or <ruby> markup to these examples.)

I hadn't thought of <array>, which actually might help solve a processing
problem on our vendor's side even while still using <graphic>.


-- Michael R. Boudreau Electronic Publishing Technology Manager The University of Chicago Press 1427 E. 60th Street Chicago, IL 60637 (773) 753-3298 www.journals.uchicago.edu





On 11/20/13, 9:14 AM, "Alexander Schwarzman" <aschwarzman@xxxxxxxxx>
wrote:

Or, perhaps, use <array>, with either <graphic>, as Nikos suggested,
or with <tbody> inside...

--Sasha

Alexander ('Sasha') Schwarzman, Content Technology Architect
phone: +1.202.416.1979 | e-mail: aschwarzman@xxxxxxx

The Optical Society (OSA)
2010 Massachusetts Ave., NW
Washington, DC 20036 USA
www.osa.org


On Wed, Nov 20, 2013 at 5:01 AM, Nikos Markantonatos <nikos@xxxxxxxxxx> wrote:
Hi Michael,

The question that arises of course out of the "semantically reasonable"
encoding of such difficult pieces of text is why you need it. Are you
planning to draw some logic across different types of such linguistic
representations? In that case, JATS alone will hardly offer you a
solution.
JATS often resorts to other known standards for the representation of
"tough" textual pieces, such as mathematical equations (MathML) and
tables
(XHTML, OASIS). If there was a corresponding XML encoding standard for
linguistic representations, one could make the case for embedding it
into
JATS.

Otherwise, you are left to choose between the encoding options
suggested by
Debbie, or to capture it as an image (my favorite option), or even
attempt
to represent it in TeX/LaTeX or MathML.

Best regards,
Nikos Markantonatos
Atypon


On 11/19/2013 11:47 PM, Debbie Lapeyre wrote:

Dear Michael--


Ouch! No you are not overlooking anything obvious. The problem
is that, although you ask for "semantically reasonable", you
really want presentation markup. JATS does not do presentation,
by design or very well.

   - My first thought is a table, which this certainly looks like
     to me. But I do see your problem.

   - If it has to present EXACTLY this way, another obvious
     (but less than perfect) choice is <preformat>. That would
      - force this into a monofont (sorry about that)
      - preserve all your alignments and whitespace
      - let you include the italics, bold, and stuff.

   - Another possibility (not in NLM 3.0, but in the brand new
     JATS 1.1d1) is using <ruby>, which has a base (<rb>) and a
     ruby text annotation (rt) traditionally displayed atop the
     base (rt), or inside parenthesis after the base for browsers
     that cannot handle Ruby. Ruby is part of HTML5, as well as
     part of JATS. Ruby markup is intended for textual annotation,
     and might fit this case very well.

But I've got to tell you, I found this example incredibly hard to
human parse and be sure what went with what and why were these 2
clusters parallel and that one all alone? When the top line and the
bottom line both had values, I was fine, but sometimes... Whatever
you decide, a few horizontal lines or just more white space between
the lines and/or less between the line and its gloss, would help
me to separate.

--Debbie


On Nov 19, 2013, at 4:17 PM, Michael Boudreau <mboudreau@xxxxxxxxxxxxxxxxxx> wrote:

Greetings,

Has anyone tackled the problem of marking up textual illustrations
that
require multiple points of vertical alignment--the sort of thing for
which
youDd set tab stops on a typewriter or word processor?

IDm working on a linguistics journal that has lots of glossed text
illustrations that are typeset like the items labeled (3) and (4) on
this
page image:

http://mss.uchicago.edu:81/mrb/linguistics.png

WeDre using the NLM Journal Publishing 3.0 DTD, and IDm at a loss for
a
markup solution that seems semantically reasonable and illustrates
the
relationships between the chunks of text that the typesetting makes
obvious. IDve considered table markup, but I donDt want to break a
single
sentence or other unit of meaning into multiple table cells across a
row.
When I consider how our online host would convert XML into HTML, I
see
only the same bad option.

Am I overlooking something obvious?

--
Michael R. Boudreau
Electronic Publishing Technology Manager
The University of Chicago Press
1427 E. 60th Street
Chicago, IL 60637
(773) 753-3298
www.journals.uchicago.edu



================================================================
Deborah A Lapeyre              mailto:dalapeyre@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.      http://www.mulberrytech.com
17 West Jefferson Street         Phone: 301-315-9631 (USA)
Suite 207                        Fax:   301-315-8385
Rockville, MD 20850
----------------------------------------------------------------
Mulberry Technologies: Consultancy for XML, XSLT, and Schematron
================================================================


-- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard VC6ckler

Current Thread