Re: [jats-list] Using JATS to cite research data.

Subject: Re: [jats-list] Using JATS to cite research data.
From: "Kimberly Tryka ktryka@xxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 8 May 2014 20:39:53 -0000
Ian -

Recently, I was involved in a project that was looking at how (whether?)
researchers with access to data from dbGaP (http://www.ncbi.nlm.nih.gov/gap)
were citing the data that they retrieved from the database.

dbGaP does have guidelines about data citation, and many of the committees
who grant access to the data have boilerplate language they require
researchers to put in their publications.  Yet, there were still a
significant number of published papers (20-25%) using dbGaP data that
either cited their data inadequately (by simply pointing to the general
dbGaP site rather than giving a proper accession number) or simply
neglected to mention dbGaP at all.

Even with those who did cite the data properly (in that they put the
appropriate accession number into their publication), finding that
accession was a bit of a game.  You made the statement:

Citations to research data is currently coded almost arbitrarily
> across different publishers, making it hard to machine read
> data contributions in the literature.
>

As I worked through the papers citing dbGaP data I found the citations in
many different places including sections called (not a comprehensive list):
* Abstract
* Accession codes/numbers
* Acknowledgments
* Analysis
* Applications
* Author Information
* Data access/accessibility/availability/accession
* Data and Sample Sharing
* Database content
* Discussion
* Footnotes
* Funding/Grant Support
* Genetic Influences
* Introduction
* Main Text
* Materials and Methods
* Neurological disorders
* Online Methods
* Process of Phenotype Harmonization
* References
* Results
* Simulations and Real Data Analysis
* Subjects
* Supplementary Material
* URLs
* Web Resources

Additionally, accession numbers were buried in table footnotes and figure
captions, as well as in, for example, tables in pdf files in supplementary
materials.

I think it would be amazing if FORCE11 (or RDA or CODATA) would come up
with, not only with a recommendation for the format of a data citation, but
also a recommendation for where that data citation should exist in a
published paper.  Once there was a 'where' defined, it might then be easier
to figure out how/whether JATS needs to be altered.

---Kim



On Fri, May 2, 2014 at 9:10 AM, Ian Mulvany i.mulvany@xxxxxxxxxxxxxxxxx <
jats-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> I'd like to tap into the collective wisdom of this group in
> preparation for a workshop that I am co-organising in June on data
> citation.
>
>
> # Introduction and Question
> How can we best use JATS to cite research data? Does this group have
> specific examples of data citation that they could share with me,
> and does anyone have strong opinions about the straw man options that
> I list at the bottom of this message?
>
> # Need
> Citations to research data is currently coded almost arbitrarily
> across different publishers, making it hard to machine read
> data contributions in the literature.
>
> # Background
> We are running a workshop in June at the British Library to propose
> some best practices for citing research data. This is being done
> under the umbrella of the FORCE11 Data Citation Implementation group (
> - https://www.force11.org/datacitationimplementation), and it will
> involve a selection of invited participants, mostly
> representing production departments of STM publishers. Ahead of that
> meeting I'd like to start this thread as a background discussion to
> the viability of some of the options the organisers of the meeting are
> thinking about. Below I list three straw man options that we have been
> discussing, along with basic pros and cons.
>
> # Straw man Options
>
> 1 get people to agree on best practice using the existing tag set
>   pros:
>        - nothing new needs to be introduced to JATS
>   cons:
>   - probably makes adoption harder, and the creation of tooling to
> identify data citations harder, as these tools
>   as overloading existing tags will likely not produce a tag syntax
> unique to research data
>
>
> 2 extend the JATS tag set to support specific citation of research data
>   pros:
>   - clean start, produces a standard that everyone can move towards,
> good for creation of downstream tools
>   cons:
>   - extending JATS can take some time, owing to the standardisation process
>
>
> 3 produce an extension to JATS for data citation, along the same lines
> as http://www.ncbi.nlm.nih.gov/books/NBK47081/
>   pros:
>      - does not need to wait for extension of JATS to be usable
>   cons:
>      - highly specific extension may face a difficulty in gaining
> adoption in publishing workflows

Current Thread