Re: [jats-list] aff in- or outside of contrib

Subject: Re: [jats-list] aff in- or outside of contrib
From: "Pieter Lamers pieter.lamers@xxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 27 May 2020 11:01:11 -0000
Hi all,

Thanks for your thoughts and pennies. I have a few remarks:

As for Debbie's #1, I can imagine authoring being used in an authoring 
system, but even there, one may be writing an article with co-authors 
from the same institute, so I still feel pumpkin should not prohibit 
contrib/xref.

I have a potential use case for #2: we have a couple of translation 
sites (e.g. https://benjamins.com/online/hts), where original articles 
are being translated into various languages. The translators of those 
articles are currently added to the contrib-group with their 
@contrib-type eq 'translator'. The documentation suggests using a 
separate contrib-group with @content-type 'translators' for such 
contributors. When these translators also have their own affiliations, 
and have them presented in their own list, it might make sense to make 
aff/aff-alternatives children of contrib-group rather than article-meta.

Apart from this consideration I also lean towards moving it all to #3.

Best,

Pieter


On 26/05/2020 18:41, Wendell Piez wapiez@xxxxxxxxxxxxxxx wrote:
> Hi,
>
> Adding my $0.02 to Debbie's analysis and useful breakdown.
>
> It is all about how much you wish to (pre) normalize the data, and for 
> which sorts of operations; the form of the normalization would 
> presumably depend on that.
>
> Given this, I think Debbie's analysis of the tradeoffs is correct. For 
> an archival subsistence form given most real-world requirements, for 
> its clarity and parsimony I would prefer option (3).
>
> However, I can also imagine a simple 'merge affiliations' 
> transformation that would render either of the other forms into form 
> (3), making it possible to use forms 1 or 2 at earlier stages. Making 
> form (1) from form (3) is also a fairly trivial operation in principle 
> (i.e. subject to considerations of defining 'identity', etc.). Even 
> moving from forms (3) or (1) into form (2) is also possible if planned 
> ahead for. Like Debbie, however, I think form (2) is probably 
> optimized for the wrong thing (most of the time).
>
> I am also not writing as a participant in JATS4R. Mainly I'm pitching 
> in to remind readers that transformations can ease the either/or 
> problems with this sort of thing, assuming data quality (sometimes a 
> big assumption I know).
>
> Cheers, Wendell
>
>
>
> On Tue, May 26, 2020 at 5:58 AM Gareth Oakes goakes@xxxxxxx 
> <mailto:goakes@xxxxxxx> <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx 
> <mailto:jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
>
>     I think if it was completely greenfields then Debbiebs option #3
>     is the way to go. Most of the JATS data we come across works that
>     way, and itbs not hugely more difficult from an XML processing
>     perspective. I think the point about being consistent of doing it
>     one way or the other in your backfile is a very laudable idea.
>
>     Clearly there is no one-size-fits-all across publishers. I feel
>     like a nice approach for a publisher would be to have a Schematron
>     acting as an overlay on a base JATS schema. The overlay would
>     impose the organization- and/or product-specific validation rules
>     such as how you are meant to tag up <aff>s. I think thatbs a
>     reasonably commonly used approach? Better than maintaining a
>     customized schema p

>
>     // Gareth Oakes
>
>     // Chief Architect, GPSL
>
>     // www.gpsl.co <http://www.gpsl.co>
>
>     *From: *"Melissa Harrison m.harrison@xxxxxxxxxxxxxxxxx
>     <mailto:m.harrison@xxxxxxxxxxxxxxxxx>"
>     <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx
>     <mailto:jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>>
>     *Reply to: *"jats-list@xxxxxxxxxxxxxxxxxxxxxx
>     <mailto:jats-list@xxxxxxxxxxxxxxxxxxxxxx>"
>     <jats-list@xxxxxxxxxxxxxxxxxxxxxx
>     <mailto:jats-list@xxxxxxxxxxxxxxxxxxxxxx>>
>     *Date: *Tuesday, 26 May 2020 at 18:26
>     *To: *"jats-list@xxxxxxxxxxxxxxxxxxxxxx
>     <mailto:jats-list@xxxxxxxxxxxxxxxxxxxxxx>"
>     <jats-list@xxxxxxxxxxxxxxxxxxxxxx
>     <mailto:jats-list@xxxxxxxxxxxxxxxxxxxxxx>>
>     *Subject: *Re: [jats-list] aff in- or outside of contrib
>
>     Hi there
>
>     *On behalf of JATS4R*
>
>     This working group thought very long and hard and had many
>     discussions/heated debates about thisB - people have different
>     reasons for following the different options and if they went for 1
>     option only in the recommendation, this would alienate theB people
>     usingB the other option. Therefore, they had to come up with a more
>     flexible model to ensure JATS4R can help standardise the standard
>     as much as possible while making it accessible to everyone to
>     implement.
>
>     Not helpful, I appreciate, when you are willing to change your
>     data model!
>
>     Cheers
>
>     Melissa
>
>
>     Melissa Harrison
>
>     Head of Production Operations
>
>     Tel:B +44 1223 855340
>
>     http://elifesciences.org <http://elifesciences.org/>
>
>     On Mon, May 25, 2020 at 11:44 PM Debbie Lapeyre
>     dalapeyre@xxxxxxxxxxxxxxxx <mailto:dalapeyre@xxxxxxxxxxxxxxxx>
>     <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx
>     <mailto:jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
>
>         1) contrib/aff
>
>         IMO: JATS allows <aff> as a child of <contrib> precisely for the
>         single-author case (which is what Authoring was also designed for,
>         although almost nobody uses Authoring.)
>
>         In the modern STEM world, it is not unusual to have 100+ authors.
>         I do not favor contrib/aff, as it can lead to a lot of
>         redundant data.
>
>         If, as is also common, a single author has 4 or 5 institutional
>         affiliations, the data proliferation gets even worse.
>
>         2) contrib-group/aff
>
>         IMO: This one was allowed, so that publishers could group authors
>         by institution, and only need to input the <aff> once, for the
>         whole group. Rare nowadays, I hope.
>
>         3) <aff>s all together AFTER last <contrib-group>
>
>         IMO: This is the cleanest. Each <aff> is only present once,
>         eliminating redundant data. Yes, you need to use an <xref> on
>         each author pointing to each applicable <aff>. But it is easy
>         for one author to have 5 affiliations and for 100 authors to
>         have only 6 between them.
>
>         I think this is cleanest for querying as well, as you can write
>         an XPath to find all the <aff>s with the characteristic you want
>         (all from one country or all NIH or whatever) and then get the
>         contributors who have the @rid on their <xref> that matches
>         the @id you found on the <aff> or <aff>s you wanted.
>
>         This is a Lapeyre not a Mulberry opinion.
>         I do not work for or with JATS4R (good folks though).
>
>         --Debbie
>
>
>
>         > On May 25, 2020, at 5:38 PM, Pieter Lamers
>         pieter.lamers@xxxxxxxxxxxx <mailto:pieter.lamers@xxxxxxxxxxxx>
>         <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx
>         <mailto:jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
>         >
>         > Hi All,
>         >
>         > We are looking into refactoring our article-meta structure
>         with regards to affiliations. We now have two practices:
>         >
>         > 1. <aff> is a child of <contrib>, no xref linking needed.
>         > 2. <aff> is a child of <article-meta> (or <contrib-group>),
>         xref linking needed between <contrib> and <aff>
>         >
>         > We are a bit in doubt as to what the preferred format should be.
>         >
>         > A quick check on jats4r.org <http://jats4r.org>
>         (https://jats4r.org/authors-and-affiliations) tells me that
>         there is no preferred format for the choice we are facing: "It
>         is the content-providerbs choice which to use".
>         >
>         > Sometimes it is suggested to follow the strictest variant of
>         JATS where possible so we took a look at pumpkin
>         (article-authoring). It appears that (2) is not possible, as
>         <aff> cannot be a child of <contrib-group> or <article-meta>,
>         even though the notes tell us that
>         >
>         > "The linkage from a contributor to an affiliation should be
>         made using the ID/IDREF mechanism. The @id attribute of an
>         <aff> element will be pointed to from one or more <contrib>
>         elements."
>         (https://jats.nlm.nih.gov/articleauthoring/tag-library/1.3d1/element/aff.html)
>         >
>         > This means that moving away from pattern (1) is making the
>         document less compatible with pumpkin. Not that this is a
>         compelling argument I guess. What I am thinking is:
>         >
>         > a. having <aff> separate means less redundancy in the file
>         (argument for choosing (2) )
>         > b. having <aff> inside <contrib> is closer to the semantics
>         as I perceive them: affiliation is primarily a property of the
>         author, not of the article (argument in favor of (1) ).
>         >
>         > The demand for statistics of any kind is growing. The other
>         day we were asked to report numbers of articles with a first
>         author affiliated with some affiliation in a list of German
>         institutes. I could report this from an SQL copy of the data,
>         but would like to see the JATS files with the flexible nature
>         of XML as the place to ask, so we are going to add ROR if we
>         can find it, and maybe other identifiers. This would save me
>         from keeping all these data in sync with SQL. But in such a
>         case it would be nice to have a single structure pattern to
>         query and not multiple.
>         >
>         > Any thoughts, anyone?
>         >
>         > Best
>         > Pieter
>         >
>         >
>         > --
>         > Pieter Lamers
>         > John Benjamins Publishing Company
>         > Postal Address: P.O. Box 36224, 1020 ME AMSTERDAM, The
>         Netherlands
>         > Visiting Address: Klaprozenweg 75G, 1033 NN AMSTERDAM, The
>         Netherlands
>         > Warehouse: Kelvinstraat 11-13, 1446 TK PURMEREND, The
>         Netherlands
>         > tel: +31 20 630 4747
>         > web: www.benjamins.com <http://www.benjamins.com>
>         >
>
>
>         ================================================================
>         Deborah A LapeyreB  B  B  B  B  B  B 
>         mailto:dalapeyre@xxxxxxxxxxxxxxxx
>         <mailto:dalapeyre@xxxxxxxxxxxxxxxx>
>         Mulberry Technologies, Inc. http://www.mulberrytech.com
>         <http://www.mulberrytech.com>
>         17 West Jefferson StreetB  B  B  B  B Phone: 301-315-9631 (USA)
>         Suite 207B  B  B  B  B  B  B  B  B  B  B  B  Fax:B  B 301-315-8385
>         Rockville, MD 20850
>         ----------------------------------------------------------------
>         Mulberry Technologies: Consultancy for XML, XSLT, and Schematron
>         ================================================================
>
>
>     Image removed by sender.
>
>     elifesciences.org <https://elifesciences.org>
>
>     eLife Sciences Publications, Ltd is a limited liability non-profit
>     non-stock corporation incorporated in the State of Delaware, USA,
>     with company number 5030732, and is registered in the UK with
>     company number FC030576 and branch number BR015634 at the address
>     Westbrook Centre, Milton Road, Cambridge, CB4 1YG.
>
>     JATS-List info and archive
>     <http://www.mulberrytech.com/JATS/JATS-List/>
>
>     EasyUnsubscribe
>     <http://lists.mulberrytech.com/unsub/jats-list/2708257> (by email)
>
>
>
> -- 
> ...Wendell Piez... ...wendell -at- nist -dot- gov...
> ...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
> ...github.com/wendellpiez. <http://github.com/wendellpiez.>.. 
> ...gitlab.coko.foundation/wendell...
> JATS-List info and archive <http://www.mulberrytech.com/JATS/JATS-List/>
> EasyUnsubscribe 
> <http://lists.mulberrytech.com/unsub/jats-list/2854576> (by email 
> <>)

-- 
Pieter Lamers
John Benjamins Publishing Company
Postal Address: P.O. Box 36224, 1020 ME AMSTERDAM, The Netherlands
Visiting Address: Klaprozenweg 75G, 1033 NN AMSTERDAM, The Netherlands
Warehouse: Kelvinstraat 11-13, 1446 TK PURMEREND, The Netherlands
tel: +31 20 630 4747
web: www.benjamins.com

Current Thread