[xsl] Re: table transformation

Subject: [xsl] Re: table transformation
From: "Dorothy Hoskins dorothy.hoskins@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 21 Jun 2023 03:14:36 -0000
 HI, as a person who has met and worked with people overseas who rekey PDFs
and add metadata and XML tags to text extracted from them (or from Word) to
create XML  I am concerned about the ultimate effects of ChatGPT on the
workers. In Chennai, doing the rekeying and markup was a lower middle class
job with more prestige than working in a call center. The women and men I
met were proud to have the technical computer skills and the knowledge of
the XML DTDs. When the bulk of their work shifts to automated markup, the
majority of them will not be needed for this work. What then will happen to
this whole class of workers will ripple through the economies of countries
that perform rekeying and tagging.

There's probably no stopping this shift to automation. Many scientific
publishers have "back end" service providers overseas in southeast Asia or
eastern Europe, but the publishers constantly struggle to contain costs. I
anticipate social unrest will spread as unemployment rises after publishers
shift their workflows to cut production expenses.

 If anyone knows of a business already shifting to using LLMs for XML
markup, it would be interesting to know the social impacts on their
previous service providers.

I have already played with using prompts and samples to get ChatGPT to
generate some schematron and XSpec unit tests. As Jon Udell notes in his
post, the breakdown of large transformations into smaller tasks seems to be
the best way to get good results, and QA is critical. But if newbies to
LLMs can produce results, the entire XML transformation space is going to
be revolutionized shortly.

By the way, millions of scientific articles that Jon mentions as sources
for extracting text from PDFs, are already tagged in JATS xml with full
metadata by the publishing platforms like Atypon and Siverchair, which then
produce web pages from the JATS. Those web pages are generally the paid
subscriber content of the publishers. (Many articles are also in the common
license if required by funding sources.) So it sounds like a bad practice
to engineer a transform for extracted text if an article might already be
fully tagged in a semantically rich content model like JATS that includes
HTML table tagging.
https://jats.nlm.nih.gov/publishing/tag-library/1.0/n-pau2.html
Regards, Dorothy

> --------- Forwarded message ----------
From: Dave Pawson <dave.pawson@xxxxxxxxx>
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx

Date: Mon, 19 Jun 2023 07:38:31 +0100
Subject: Table transformation
Interesting use of LLM
https://blog.jonudell.net/2023/06/18/why-llm-assisted-table-transformation-is-a-big-deal/

regards

-- 
Dave Pawson

Current Thread