[jats-list] Re: Does Blue need a Lite version, to counter its creeping aquafication?

Subject: [jats-list] Re: Does Blue need a Lite version, to counter its creeping aquafication?
From: "nina_linn.reinhardt@xxxxxxxxxxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 9 Mar 2021 13:21:53 -0000
Dear JATS community,

Thanks a lot to all participants who provided me with JATS articles and
HTML lists for my master's thesis.

I was hoping that the union of all tags and attributes used in the
collections will immediately reveal a "consensus customization" of the
most popular elements and attributes, with a vocabulary that is
significantly reduced compared to Blue. Although this "all" de-facto
customization omits 203 elements and 111 attributes compared to Blue, the
majority of these omissions are rarely used MathML elements or attributes.
As discussed in the "request for data donation" paper, disallowing parts
of MathML is not what I'm after.
Currently we are working on refining the statistics e.g. regarding the
frequency of use of certain elements and attributes.
As you can see in the result table
(https://nreinhar.github.io/JATS_Customizing_Analysis_Data/all.xhtml), the
PMC collection alone comprises more than 400 items (that is, distinct
element or attribute names). Many of these elements or attributes only
appear in very few of the approx. 380,000 articles. The idea is to ignore
items whose frequency (defined as: number of articles in which an item
occurs divided by the total number of articles in the collection) is
significantly below average, compared to the same items frequency in the
other collections. Hopefully that will slim down the consensus
collections item count.
Also, the table now has a column that shows the deviations regarding the
Blue tag set.

Im looking forward to starting the analysis. And I hope to come up with
some interesting results in the next few months.

Ill keep you posted!

Best regards,
Nina

Am Di, 16.02.2021, 18:12 schrieb Imsieke, Gerrit, le-tex:
> Dear JATS Community,
>
>
> As announced in a previous message to this list [1], Nina Reinhardt is
> currently working on her master's thesis in which she tries to find a
> consensus customization for the (estimated) 90% of JATS users that only
> need about half of Blue's available elements and attributes.
>
> My role in this is that I am co-supervising the thesis and that I came
> up with the idea after another discussion on this list last year, in which
> Tommie suggested that "a dozen different people (or small groups)
> each craft[ed] a 'JATS Lite' and we compare[d] them" [2].
>
> This was our first idea: To provide a form with a list of available
> elements and attributes, and people would be able to put together their
> favorite Lite customization interactively.
>
> But then we thought that we should also offer a way for people to upload
> representative JATS content from their production or repositories and treat
> these collections as expressions of tagging preferences, or as "de-facto
> customizations". And then she skipped the interactive form part and
> focused entirely on analyzing these collections and which metrics are
> applicable to them in order to identify consensus customizations.
>
> Nina has written a paper in which she describes her approach and what is
> needed to find this lean consensus customization (your data!):
> https://docs.google.com/document/d/1jYDT0TkYP9Tg31Ldd9gFmdwSiu98Q2mg_qOuh
> gnxpRc/
>
> You may skip most technical discussions for the time being and navigate
> right to the last section called "Data Collection". It is a call to action
> that asks you to donate some of your valuable JATS files to research. Or
> you can use some XSLT [3] in order to extract element/attribute name lists
> from the JATS files yourselves so you need not send potentially
> proprietary data to someone else.
>
> Please donate generously, and if possible do it by March 1st. Nina's
> thesis needs to be completed by June.
>
> You are allowed to add comments and suggestions to the Google doc, you
> may of course file issues and pull requests in the Github repo, and you can
> contact Nina and/or me via this list or direct email messages if you have
> questions or suggestions.
>
> On behalf of Nina (and myself),
>
>
> Gerrit
>
>
> [1]
> https://www.biglist.com/lists/lists.mulberrytech.com/jats-list/archives/20
> 2009/msg00019.html
> [2]
> https://www.biglist.com/lists/lists.mulberrytech.com/jats-list/archives/20
> 2004/msg00030.html
> [3] https://github.com/nreinhar/JATS_Customizing_Analysis/
>
>
> --
> Gerrit Imsieke
> Geschdftsf|hrer / Managing Director
> le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig,
> Germany
> Phone +49 341 355356 110, Fax +49 341 355356 510
> gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
>
> Registergericht / Commercial Register: Amtsgericht Leipzig
> Registernummer / Registration Number: HRB 24930
>
>
> Geschdftsf|hrer / Managing Directors:
> Gerrit Imsieke, Svea Jelonek, Thomas Schmidt

Current Thread