Re: [jats-list] aff in- or outside of contrib

Subject: Re: [jats-list] aff in- or outside of contrib
From: "Charles O'Connor coconnor@xxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 30 Oct 2020 21:28:22 -0000
Howdy,

Ibm reminded that I havenbt looked at the JATS list during the pandemic,
because this is an issue I had to make a call on when subsetting the DTD for
Aries workflows.

I went with #2 for much the same reason as Pieter, separating different
contributor types. This structure is especially useful in content that is
likely to have long lists of non-byline authors/affiliations that may not
actually be rendered, and if rendered, not at the beginning of the article.

--Charles

From: Pieter Lamers pieter.lamers@xxxxxxxxxxxx
<jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, May 27, 2020 7:01 AM
To: jats-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [jats-list] aff in- or outside of contrib

*** External email: use caution ***

Hi all,
Thanks for your thoughts and pennies. I have a few remarks:
As for Debbie's #1, I can imagine authoring being used in an authoring system,
but even there, one may be writing an article with co-authors from the same
institute, so I still feel pumpkin should not prohibit contrib/xref.
I have a potential use case for #2: we have a couple of translation sites
(e.g. https://benjamins.com/online/hts), where original articles are being
translated into various languages. The translators of those articles are
currently added to the contrib-group with their @contrib-type eq 'translator'.
The documentation suggests using a separate contrib-group with @content-type
'translators' for such contributors. When these translators also have their
own affiliations, and have them presented in their own list, it might make
sense to make aff/aff-alternatives children of contrib-group rather than
article-meta.
Apart from this consideration I also lean towards moving it all to #3.
Best,
Pieter

On 26/05/2020 18:41, Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxx wrote:
Hi,

Adding my $0.02 to Debbie's analysis and useful breakdown.

It is all about how much you wish to (pre) normalize the data, and for which
sorts of operations; the form of the normalization would presumably depend on
that.

Given this, I think Debbie's analysis of the tradeoffs is correct. For an
archival subsistence form given most real-world requirements, for its clarity
and parsimony I would prefer option (3).

However, I can also imagine a simple 'merge affiliations' transformation that
would render either of the other forms into form (3), making it possible to
use forms 1 or 2 at earlier stages. Making form (1) from form (3) is also a
fairly trivial operation in principle (i.e. subject to considerations of
defining 'identity', etc.). Even moving from forms (3) or (1) into form (2) is
also possible if planned ahead for. Like Debbie, however, I think form (2) is
probably optimized for the wrong thing (most of the time).

I am also not writing as a participant in JATS4R. Mainly I'm pitching in to
remind readers that transformations can ease the either/or problems with this
sort of thing, assuming data quality (sometimes a big assumption I know).

Cheers, Wendell



On Tue, May 26, 2020 at 5:58 AM Gareth Oakes mailto:goakes@xxxxxxx
<mailto:jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
I think if it was completely greenfields then Debbiebs option #3 is the way
to go. Most of the JATS data we come across works that way, and itbs not
hugely more difficult from an XML processing perspective. I think the point
about being consistent of doing it one way or the other in your backfile is a
very laudable idea.
B 
Clearly there is no one-size-fits-all across publishers. I feel like a nice
approach for a publisher would be to have a Schematron acting as an overlay on
a base JATS schema. The overlay would impose the organization- and/or
product-specific validation rules such as how you are meant to tag up <aff>s.
I think thatbs a reasonably commonly used approach? Better than maintaining
a customized schema p

B 
// Gareth Oakes
// Chief Architect, GPSL
// http://www.gpsl.co
B 
From: "Melissa Harrison mailto:m.harrison@xxxxxxxxxxxxxxxxx";
<mailto:jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Reply to: "mailto:jats-list@xxxxxxxxxxxxxxxxxxxxxx";
<mailto:jats-list@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tuesday, 26 May 2020 at 18:26
To: "mailto:jats-list@xxxxxxxxxxxxxxxxxxxxxx";
<mailto:jats-list@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [jats-list] aff in- or outside of contrib
B 
Hi there
B 
On behalf of JATS4R
B 
This working group thought very long and hard and had many discussions/heated
debates about thisB - people have different reasons for following the
different options and if they went for 1 option only in the recommendation,
this would alienate theB people usingB the other option. Therefore, they had
to come up with a more flexible model to ensure JATS4R can help standardise
the standard as much as possible while making it accessible to everyone to
implement.
B 
Not helpful, I appreciate, when you are willing to change your data model!
B 
Cheers
Melissa


B 
B 
Melissa Harrison
Head of Production Operations
Tel:B +44 1223 855340
http://elifesciences.org/
B 
B 
On Mon, May 25, 2020 at 11:44 PM Debbie Lapeyre
mailto:dalapeyre@xxxxxxxxxxxxxxxx
<mailto:jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
1) contrib/aff

IMO: JATS allows <aff> as a child of <contrib> precisely for the
single-author case (which is what Authoring was also designed for,
although almost nobody uses Authoring.)

In the modern STEM world, it is not unusual to have 100+ authors.
I do not favor contrib/aff, as it can lead to a lot of redundant data.

If, as is also common, a single author has 4 or 5 institutional
affiliations, the data proliferation gets even worse.

2) contrib-group/aff

IMO: This one was allowed, so that publishers could group authors
by institution, and only need to input the <aff> once, for the
whole group. Rare nowadays, I hope.

3) <aff>s all together AFTER last <contrib-group>

IMO: This is the cleanest. Each <aff> is only present once,
eliminating redundant data. Yes, you need to use an <xref> on
each author pointing to each applicable <aff>. But it is easy
for one author to have 5 affiliations and for 100 authors to
have only 6 between them.

I think this is cleanest for querying as well, as you can write
an XPath to find all the <aff>s with the characteristic you want
(all from one country or all NIH or whatever) and then get the
contributors who have the @rid on their <xref> that matches
the @id you found on the <aff> or <aff>s you wanted.

This is a Lapeyre not a Mulberry opinion.
I do not work for or with JATS4R (good folks though).

--Debbie



> On May 25, 2020, at 5:38 PM, Pieter Lamers mailto:pieter.lamers@xxxxxxxxxxxx
<mailto:jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi All,
>
> We are looking into refactoring our article-meta structure with regards to
affiliations. We now have two practices:
>
> 1. <aff> is a child of <contrib>, no xref linking needed.
> 2. <aff> is a child of <article-meta> (or <contrib-group>), xref linking
needed between <contrib> and <aff>
>
> We are a bit in doubt as to what the preferred format should be.
>
> A quick check on http://jats4r.org
(https://jats4r.org/authors-and-affiliations) tells me that there is no
preferred format for the choice we are facing: "It is the content-providerbs
choice which to use".
>
> Sometimes it is suggested to follow the strictest variant of JATS where
possible so we took a look at pumpkin (article-authoring). It appears that (2)
is not possible, as <aff> cannot be a child of <contrib-group> or
<article-meta>, even though the notes tell us that
>
> "The linkage from a contributor to an affiliation should be made using the
ID/IDREF mechanism. The @id attribute of an <aff> element will be pointed to
from one or more <contrib> elements."
(https://jats.nlm.nih.gov/articleauthoring/tag-library/1.3d1/element/aff.html)
>
> This means that moving away from pattern (1) is making the document less
compatible with pumpkin. Not that this is a compelling argument I guess. What
I am thinking is:
>
> a. having <aff> separate means less redundancy in the file (argument for
choosing (2) )
> b. having <aff> inside <contrib>B  is closer to the semantics as I perceive
them: affiliation is primarily a property of the author, not of the article
(argument in favor of (1) ).
>
> The demand for statistics of any kind is growing. The other day we were
asked to report numbers of articles with a first author affiliated with some
affiliation in a list of German institutes. I could report this from an SQL
copy of the data, but would like to see the JATS files with the flexible
nature of XML as the place to ask, so we are going to add ROR if we can find
it, and maybe other identifiers. This would save me from keeping all these
data in sync with SQL. But in such a case it would be nice to have a single
structure pattern to query and not multiple.
>
> Any thoughts, anyone?
>
> Best
> Pieter
>
>
> --
> Pieter Lamers
> John Benjamins Publishing Company
> Postal Address: P.O. Box 36224, 1020 ME AMSTERDAM, The Netherlands
> Visiting Address: Klaprozenweg 75G, 1033 NN AMSTERDAM, The Netherlands
> Warehouse: Kelvinstraat 11-13, 1446 TK PURMEREND, The Netherlands
> tel: +31 20 630 4747
> web: http://www.benjamins.com
>


================================================================
Deborah A LapeyreB  B  B  B  B  B  B  mailto:mailto:dalapeyre@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.B  B  B  http://www.mulberrytech.com
17 West Jefferson StreetB  B  B  B  B Phone: 301-315-9631 (USA)
Suite 207B  B  B  B  B  B  B  B  B  B  B  B  Fax:B  B 301-315-8385
Rockville, MD 20850
----------------------------------------------------------------
Mulberry Technologies: Consultancy for XML, XSLT, and Schematron
================================================================


B 
https://elifesciences.org
B 
eLife Sciences Publications, Ltd is a limited liability non-profit non-stock
corporation incorporated in the State of Delaware, USA, with company number
5030732, and is registered in the UK with company number FC030576 and branch
number BR015634 at the address Westbrook Centre, Milton Road, Cambridge, CB4
1YG.
http://www.mulberrytech.com/JATS/JATS-List/
http://lists.mulberrytech.com/unsub/jats-list/2708257 (by email)



--
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
...http://github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
http://www.mulberrytech.com/JATS/JATS-List/
http://lists.mulberrytech.com/unsub/jats-list/2854576 (by email)
--
Pieter Lamers
John Benjamins Publishing Company
Postal Address: P.O. Box 36224, 1020 ME AMSTERDAM, The Netherlands
Visiting Address: Klaprozenweg 75G, 1033 NN AMSTERDAM, The Netherlands
Warehouse: Kelvinstraat 11-13, 1446 TK PURMEREND, The Netherlands
tel: +31 20 630 4747
web: http://www.benjamins.com
http://www.mulberrytech.com/JATS/JATS-List/
http://lists.mulberrytech.com/unsub/jats-list/2963104 ()

Current Thread