Re: [jats-list] aff in- or outside of contrib

Subject: Re: [jats-list] aff in- or outside of contrib
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 26 May 2020 16:40:58 -0000
Hi,

Adding my $0.02 to Debbie's analysis and useful breakdown.

It is all about how much you wish to (pre) normalize the data, and for
which sorts of operations; the form of the normalization would presumably
depend on that.

Given this, I think Debbie's analysis of the tradeoffs is correct. For an
archival subsistence form given most real-world requirements, for its
clarity and parsimony I would prefer option (3).

However, I can also imagine a simple 'merge affiliations' transformation
that would render either of the other forms into form (3), making it
possible to use forms 1 or 2 at earlier stages. Making form (1) from form
(3) is also a fairly trivial operation in principle (i.e. subject to
considerations of defining 'identity', etc.). Even moving from forms (3) or
(1) into form (2) is also possible if planned ahead for. Like Debbie,
however, I think form (2) is probably optimized for the wrong thing (most
of the time).

I am also not writing as a participant in JATS4R. Mainly I'm pitching in to
remind readers that transformations can ease the either/or problems with
this sort of thing, assuming data quality (sometimes a big assumption I
know).

Cheers, Wendell



On Tue, May 26, 2020 at 5:58 AM Gareth Oakes goakes@xxxxxxx <
jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> I think if it was completely greenfields then Debbiebs option #3 is the
> way to go. Most of the JATS data we come across works that way, and itbs
> not hugely more difficult from an XML processing perspective. I think the
> point about being consistent of doing it one way or the other in your
> backfile is a very laudable idea.
>
>
>
> Clearly there is no one-size-fits-all across publishers. I feel like a
> nice approach for a publisher would be to have a Schematron acting as an
> overlay on a base JATS schema. The overlay would impose the organization-
> and/or product-specific validation rules such as how you are meant to tag
> up <aff>s. I think thatbs a reasonably commonly used approach? Better
than
> maintaining a customized schema p

>
>
>
> // Gareth Oakes
>
> // Chief Architect, GPSL
>
> // www.gpsl.co
>
>
>
> *From: *"Melissa Harrison m.harrison@xxxxxxxxxxxxxxxxx" <
> jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
> *Reply to: *"jats-list@xxxxxxxxxxxxxxxxxxxxxx" <
> jats-list@xxxxxxxxxxxxxxxxxxxxxx>
> *Date: *Tuesday, 26 May 2020 at 18:26
> *To: *"jats-list@xxxxxxxxxxxxxxxxxxxxxx" <jats-list@xxxxxxxxxxxxxxxxxxxxxx
> >
> *Subject: *Re: [jats-list] aff in- or outside of contrib
>
>
>
> Hi there
>
>
>
> *On behalf of JATS4R*
>
>
>
> This working group thought very long and hard and had many
> discussions/heated debates about this - people have different reasons for
> following the different options and if they went for 1 option only in the
> recommendation, this would alienate the people using the other option.
> Therefore, they had to come up with a more flexible model to ensure JATS4R
> can help standardise the standard as much as possible while making it
> accessible to everyone to implement.
>
>
>
> Not helpful, I appreciate, when you are willing to change your data model!
>
>
>
> Cheers
>
> Melissa
>
>
>
>
>
>
> Melissa Harrison
>
> Head of Production Operations
>
> Tel: +44 1223 855340
>
> http://elifesciences.org
>
>
>
>
>
> On Mon, May 25, 2020 at 11:44 PM Debbie Lapeyre dalapeyre@xxxxxxxxxxxxxxxx
> <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> 1) contrib/aff
>
> IMO: JATS allows <aff> as a child of <contrib> precisely for the
> single-author case (which is what Authoring was also designed for,
> although almost nobody uses Authoring.)
>
> In the modern STEM world, it is not unusual to have 100+ authors.
> I do not favor contrib/aff, as it can lead to a lot of redundant data.
>
> If, as is also common, a single author has 4 or 5 institutional
> affiliations, the data proliferation gets even worse.
>
> 2) contrib-group/aff
>
> IMO: This one was allowed, so that publishers could group authors
> by institution, and only need to input the <aff> once, for the
> whole group. Rare nowadays, I hope.
>
> 3) <aff>s all together AFTER last <contrib-group>
>
> IMO: This is the cleanest. Each <aff> is only present once,
> eliminating redundant data. Yes, you need to use an <xref> on
> each author pointing to each applicable <aff>. But it is easy
> for one author to have 5 affiliations and for 100 authors to
> have only 6 between them.
>
> I think this is cleanest for querying as well, as you can write
> an XPath to find all the <aff>s with the characteristic you want
> (all from one country or all NIH or whatever) and then get the
> contributors who have the @rid on their <xref> that matches
> the @id you found on the <aff> or <aff>s you wanted.
>
> This is a Lapeyre not a Mulberry opinion.
> I do not work for or with JATS4R (good folks though).
>
> --Debbie
>
>
>
> > On May 25, 2020, at 5:38 PM, Pieter Lamers pieter.lamers@xxxxxxxxxxxx <
> jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Hi All,
> >
> > We are looking into refactoring our article-meta structure with regards
> to affiliations. We now have two practices:
> >
> > 1. <aff> is a child of <contrib>, no xref linking needed.
> > 2. <aff> is a child of <article-meta> (or <contrib-group>), xref linking
> needed between <contrib> and <aff>
> >
> > We are a bit in doubt as to what the preferred format should be.
> >
> > A quick check on jats4r.org (https://jats4r.org/authors-and-affiliations)
> tells me that there is no preferred format for the choice we are facing:
> "It is the content-providerbs choice which to use".
> >
> > Sometimes it is suggested to follow the strictest variant of JATS where
> possible so we took a look at pumpkin (article-authoring). It appears that
> (2) is not possible, as <aff> cannot be a child of <contrib-group> or
> <article-meta>, even though the notes tell us that
> >
> > "The linkage from a contributor to an affiliation should be made using
> the ID/IDREF mechanism. The @id attribute of an <aff> element will be
> pointed to from one or more <contrib> elements." (
>
https://jats.nlm.nih.gov/articleauthoring/tag-library/1.3d1/element/aff.html
> )
> >
> > This means that moving away from pattern (1) is making the document less
> compatible with pumpkin. Not that this is a compelling argument I guess.
> What I am thinking is:
> >
> > a. having <aff> separate means less redundancy in the file (argument for
> choosing (2) )
> > b. having <aff> inside <contrib>  is closer to the semantics as I
> perceive them: affiliation is primarily a property of the author, not of
> the article (argument in favor of (1) ).
> >
> > The demand for statistics of any kind is growing. The other day we were
> asked to report numbers of articles with a first author affiliated with
> some affiliation in a list of German institutes. I could report this from
> an SQL copy of the data, but would like to see the JATS files with the
> flexible nature of XML as the place to ask, so we are going to add ROR if
> we can find it, and maybe other identifiers. This would save me from
> keeping all these data in sync with SQL. But in such a case it would be
> nice to have a single structure pattern to query and not multiple.
> >
> > Any thoughts, anyone?
> >
> > Best
> > Pieter
> >
> >
> > --
> > Pieter Lamers
> > John Benjamins Publishing Company
> > Postal Address: P.O. Box 36224, 1020 ME AMSTERDAM, The Netherlands
> > Visiting Address: Klaprozenweg 75G, 1033 NN AMSTERDAM, The Netherlands
> > Warehouse: Kelvinstraat 11-13, 1446 TK PURMEREND, The Netherlands
> > tel: +31 20 630 4747
> > web: www.benjamins.com
> >
>
>
> ================================================================
> Deborah A Lapeyre              mailto:dalapeyre@xxxxxxxxxxxxxxxx
> Mulberry Technologies, Inc.      http://www.mulberrytech.com
> 17 West Jefferson Street         Phone: 301-315-9631 (USA)
> Suite 207                        Fax:   301-315-8385
> Rockville, MD 20850
> ----------------------------------------------------------------
> Mulberry Technologies: Consultancy for XML, XSLT, and Schematron
> ================================================================
>
>
> [image: Image removed by sender.]
>
>
>
> elifesciences.org
>
>
>
> eLife Sciences Publications, Ltd is a limited liability non-profit
> non-stock corporation incorporated in the State of Delaware, USA, with
> company number 5030732, and is registered in the UK with company number
> FC030576 and branch number BR015634 at the address Westbrook Centre, Milton
> Road, Cambridge, CB4 1YG.
>
> JATS-List info and archive <http://www.mulberrytech.com/JATS/JATS-List/>
>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/jats-list/2708257> (by
> email <>)
>


--
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...

Current Thread