Re: [jats-list] Re: Does Blue need a Lite version, to counter its creeping aquafication?

Subject: Re: [jats-list] Re: Does Blue need a Lite version, to counter its creeping aquafication?
From: "nina_linn.reinhardt@xxxxxxxxxxxxxxxxxxxx" <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 12 Mar 2021 10:10:29 -0000
Hello Rebecca,

in fact we chose only a single sample for the analysis, because otherwise
it is just too much data to analyze. We already had difficulties with this
amount.  Also, we assume that the analysis would not change significantly
if we analyzed more data from PMC.

Best regards,
Nina

Am Mi, 10.03.2021, 23:53 schrieb Orris, Rebecca (NIH/NLM/NCBI) [E]:
> Nina,
>
>
>> From your table it appears you only used PMC's oa_comm-use_A-B set of
>> articles (e.g. articles with open access licenses that allow commercial
>> use and from journals that start with the letters A or B)?  Is there a
>> reason you chose to limit your analysis to just that set of articles?
>> I believe for your analysis you could use the complete PMC OA Subset,
>> so both commercial and non-commercial and all the journals, e.g. not
>> just the A-B file, but all the rest of the alphabet.  Perhaps that is
>> just too much data to analyze and you have decided that the sample you
>> have taken is enough, but I want to make sure it isn't just an
>> oversight!
>
> Kind regards,
>
>
> Rebecca
>
>
>
>
> ------------------
> Rebecca Orris, PhD
> Literature Program Special Projects
> NCBI/National Library of Medicine
> National Institutes of Health
> rebecca.orris@xxxxxxx
>
> Normal working hours: 8:30am - 5pm (Eastern time)
>
>
>
>
> ________________________________
> From: nina_linn.reinhardt@xxxxxxxxxxxxxxxxxxxx
> <jats-list-service@xxxxxxxxxxxxxxxxxxxxxx>
> Sent: Tuesday, March 9, 2021 8:22 AM
> To: jats-list@xxxxxxxxxxxxxxxxxxxxxx <jats-list@xxxxxxxxxxxxxxxxxxxxxx>
> Subject: [jats-list] Re: Does Blue need a Lite version, to counter its
> creeping aquafication?
>
> Dear JATS community,
>
>
> Thanks a lot to all participants who provided me with JATS articles and
> HTML lists for my master's thesis.
>
>
> I was hoping that the union of all tags and attributes used in the
> collections will immediately reveal a "consensus customization" of the most
> popular elements and attributes, with a vocabulary that is significantly
> reduced compared to Blue. Although this "all" de-facto customization omits
> 203 elements and 111 attributes compared to Blue, the
> majority of these omissions are rarely used MathML elements or attributes.
>  As discussed in the "request for data donation" paper, disallowing parts
>  of MathML is not what I'm after. Currently we are working on refining the
> statistics e.g. regarding the frequency of use of certain elements and
> attributes. As you can see in the result table
> (https://nreinhar.github.io/JATS_Customizing_Analysis_Data/all.xhtml), the
>  PMC collection alone comprises more than 400 items (that is, distinct
> element or attribute names). Many of these elements or attributes only
> appear in very few of the approx. 380,000 articles. The idea is to ignore
>  items whose frequency (defined as: number of articles in which an item
> occurs divided by the total number of articles in the collection) is
> significantly below average, compared to the same item?s frequency in the
>  other collections. Hopefully that will slim down the consensus
> collection?s item count. Also, the table now has a column that shows the
> deviations regarding the Blue tag set.
>
>
> I?m looking forward to starting the analysis. And I hope to come up with
> some interesting results in the next few months.
>
> I?ll keep you posted!
>
>
> Best regards,
> Nina
>
>
> Am Di, 16.02.2021, 18:12 schrieb Imsieke, Gerrit, le-tex:
>
>> Dear JATS Community,
>>
>>
>>
>> As announced in a previous message to this list [1], Nina Reinhardt is
>> currently working on her master's thesis in which she tries to find a
>> consensus customization for the (estimated) 90% of JATS users that only
>>  need about half of Blue's available elements and attributes.
>>
>> My role in this is that I am co-supervising the thesis and that I came
>> up with the idea after another discussion on this list last year, in
>> which Tommie suggested that "a dozen different people (or small groups)
>> each craft[ed] a 'JATS Lite' and we compare[d] them" [2].
>>
>> This was our first idea: To provide a form with a list of available
>> elements and attributes, and people would be able to put together their
>> favorite Lite customization interactively.
>>
>> But then we thought that we should also offer a way for people to
>> upload representative JATS content from their production or repositories
>> and treat these collections as expressions of tagging preferences, or as
>> "de-facto
>> customizations". And then she skipped the interactive form part and
>> focused entirely on analyzing these collections and which metrics are
>> applicable to them in order to identify consensus customizations.
>>
>> Nina has written a paper in which she describes her approach and what
>> is needed to find this lean consensus customization (your data!):
>> https://docs.google.com/document/d/1jYDT0TkYP9Tg31Ldd9gFmdwSiu98Q2mg_qO
>> uh gnxpRc/
>>
>> You may skip most technical discussions for the time being and navigate
>>  right to the last section called "Data Collection". It is a call to
>> action that asks you to donate some of your valuable JATS files to
>> research. Or you can use some XSLT [3] in order to extract
>> element/attribute name lists from the JATS files yourselves so you need
>> not send potentially proprietary data to someone else.
>>
>> Please donate generously, and if possible do it by March 1st. Nina's
>> thesis needs to be completed by June.
>>
>> You are allowed to add comments and suggestions to the Google doc, you
>> may of course file issues and pull requests in the Github repo, and you
>> can contact Nina and/or me via this list or direct email messages if you
>> have questions or suggestions.
>>
>> On behalf of Nina (and myself),
>>
>>
>>
>> Gerrit
>>
>>
>>
>> [1]
>> https://www.biglist.com/lists/lists.mulberrytech.com/jats-list/archives/
>> 20
>> 2009/msg00019.html
>> [2]
>> https://www.biglist.com/lists/lists.mulberrytech.com/jats-list/archives/
>> 20
>> 2004/msg00030.html
>> [3] https://github.com/nreinhar/JATS_Customizing_Analysis/
>>
>>
>>
>> --
>> Gerrit Imsieke
>> Geschdftsf|hrer / Managing Director
>> le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig,
>> Germany
>> Phone +49 341 355356 110, Fax +49 341 355356 510
>> gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
>>
>> Registergericht / Commercial Register: Amtsgericht Leipzig
>> Registernummer / Registration Number: HRB 24930
>>
>>
>>
>> Geschdftsf|hrer / Managing Directors:
>> Gerrit Imsieke, Svea Jelonek, Thomas Schmidt

Current Thread