Re: [jats-list] Does Blue need a Lite version, to counter its creeping aquafication?

Subject: Re: [jats-list] Does Blue need a Lite version, to counter its creeping aquafication?
From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 22 Feb 2021 10:22:56 -0000
Hi Pieter,

Thank you very much for your efforts!

I have converted your list into the format that Nina's analysis expects (see https://github.com/nreinhar/JATS_Customizing_Analysis/pull/7).

(continued below)

On 21.02.2021 08:40, Pieter Lamers pieter.lamers@xxxxxxxxxxxx wrote:
Hi Gerrit,

That's a nice Sunday morning exercise. I wrote the following xquery to summarize the fulltext articles:

xquery version '3.1';

let $coll as item()+ := collection('/db/data/journals.benjamins.com/')[article/body] (: requirement for 'body' is to leave out metadata-only records :)
let $article-count := count($coll)

return
  <articles count="{$article-count}">{
      for $element-group in $coll//*
      group by $namespace := $element-group/node-name() => prefix-from-QName()
      return
      <elements>{
      if( exists($namespace) ) then attribute prefix { $namespace } else (),
      for $element in $element-group
      group by $element-name := $element/local-name()
      order by $element-name
      return
        element { $element-name } {
          for $attribute in $element/@*
          group by $attribute-name := $attribute/local-name()
          order by $attribute-name
          return
            attribute { $attribute-name } { count($attribute) },
          count($element)
        }
    }</elements>
  }</articles>

I added counts as the attribute/element text value because it shows the extent of use for each element/attribute. Please note that we use Green (1.1) rather than Blue because of Blue's ordering restrictions in references and other things we really needed (forgot which).B I

No problem at all. The Science Open input is also based on Green.


abstracted away from namespace prefixes in attribute names. It results in the following:


b&



I hope this is of help.

Yes, absolutely.


Thanks again,

Gerrit



All the best, Pieter On 18/02/2021 13:11, Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx wrote:
Dear Mark,

Thank you so much for taking the time to run the analysis and for filing the pull request.

We will try to reproduce, using the cache files that you sent, under which circumstances the division by zero occurs. Then we'll see whether there is something else that we should do about it or whether your fix addresses the problem without distorting the results.

To all others that submitted files to Nina already: Thank you, too!

To everyone else who sits on tons of JATS and hasnbt sent anything yet: There's still 10 days left to put something together.

Gerrit

On 18.02.2021 12:07, DUNN, Mark wrote:
Dear Gerrit and Nina,

I am happy to try and help with this project and I wish you both every success.

OUP is unable to supply the JATS XML unfortunately, but I've been able to run the pipeline over a representative sample (with a small fix which I've put into my Git fork) to produce some statistics.

The output report and cache for 176 articles across our subject areas are attached. The articles are all from the last 2 years of publishing.

If you would like more, please let me know. OUP publishes in all the areas you are looking at (STEM, HUM, ECON) so if you need more from a particular area, I'll be happy to get some.

Kind regards,
Mark Dunn
Lead Content Architect, Oxford University Press



-----Original Message-----
From: Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: 16 February 2021 17:13
To: jats-list@xxxxxxxxxxxxxxxxxxxxxx
Cc: nina_linn.reinhardt@xxxxxxxxxxxxxxxxxxxx
Subject: [jats-list] Does Blue need a Lite version, to counter its creeping aquafication?


Dear JATS Community,

As announced in a previous message to this list [1], Nina Reinhardt is currently working on her master's thesis in which she tries to find a consensus customization for the (estimated) 90% of JATS users that only need about half of Blue's available elements and attributes.

My role in this is that I am co-supervising the thesis and that I came up with the idea after another discussion on this list last year, in which Tommie suggested that "a dozen different people (or small groups) each craft[ed] a 'JATS Lite' and we compare[d] them" [2].

This was our first idea: To provide a form with a list of available elements and attributes, and people would be able to put together their favorite Lite customization interactively.

But then we thought that we should also offer a way for people to upload representative JATS content from their production or repositories and treat these collections as expressions of tagging preferences, or as "de-facto customizations". And then she skipped the interactive form part and focused entirely on analyzing these collections and which metrics are applicable to them in order to identify consensus customizations.

Nina has written a paper in which she describes her approach and what is needed to find this lean consensus customization (your data!):
https://docs.google.com/document/d/1jYDT0TkYP9Tg31Ldd9gFmdwSiu98Q2mg_qOuhgnxpRc/



You may skip most technical discussions for the time being and navigate right to the last section called "Data Collection". It is a call to action that asks you to donate some of your valuable JATS files to research. Or you can use some XSLT [3] in order to extract element/attribute name lists from the JATS files yourselves so you need not send potentially proprietary data to someone else.


Please donate generously, and if possible do it by March 1st. Nina's thesis needs to be completed by June.

You are allowed to add comments and suggestions to the Google doc, you may of course file issues and pull requests in the Github repo, and you can contact Nina and/or me via this list or direct email messages if you have questions or suggestions.

On behalf of Nina (and myself),

Gerrit

[1]
https://www.biglist.com/lists/lists.mulberrytech.com/jats-list/archives/202009/msg00019.html


[2]
https://www.biglist.com/lists/lists.mulberrytech.com/jats-list/archives/202004/msg00030.html


[3] https://github.com/nreinhar/JATS_Customizing_Analysis/



--
Pieter Lamers
John Benjamins Publishing Company
Postal Address: P.O. Box 36224, 1020 ME AMSTERDAM, The Netherlands
Visiting Address: Klaprozenweg 75G, 1033 NN AMSTERDAM, The Netherlands
Warehouse: Kelvinstraat 11-13, 1446 TK PURMEREND, The Netherlands
tel: +31 20 630 4747
web:www.benjamins.com

JATS-List info and archive <http://www.mulberrytech.com/JATS/JATS-List/>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/jats-list/225679> (by email <>)

-- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

GeschC$ftsfC<hrer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt

Current Thread