Re: [xsl] Re: table transformation

Subject: Re: [xsl] Re: table transformation
From: "Sewell, David R (drs2n) dsewell@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 21 Jun 2023 13:24:03 -0000
To some extent, the conversion industry has already been through one similar
transformation, when OCR got good enough that there was no longer any real
point in the old double-keyboarding + proofreading workflowbreplaced by
separate OCR passes + human reconciliation of discrepancies. We have a good
deal of transformation of letterpress volumes to TEI-XML done in Chennai, and
it probably is the case that an AI engine at the current level of
sophistication could do maybe 80% of the markup correctly (based on a bit of
testing with ChatGPT), to be polished off by humans.

David S.

--
David Sewell
Co-Manager of the Rotunda Imprint, Pro Tem
The University of Virginia Press

From: "Dorothy Hoskins dorothy.hoskins@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Reply-To: "xsl-list@xxxxxxxxxxxxxxxxxxxxxx" <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tuesday, June 20, 2023 at 11:14 PM
To: "xsl-list@xxxxxxxxxxxxxxxxxxxxxx" <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Subject: [xsl] Re: table transformation

HI, as a person who has met and worked with people overseas who rekey PDFs and
add metadata and XML tags to text extracted from them (or from Word) to create
XML  I am concerned about the ultimate effects of ChatGPT on the workers. In
Chennai, doing the rekeying and markup was a lower middle class job with more
prestige than working in a call center. The women and men I met were proud to
have the technical computer skills and the knowledge of the XML DTDs. When the
bulk of their work shifts to automated markup, the majority of them will not
be needed for this work. What then will happen to this whole class of workers
will ripple through the economies of countries that perform rekeying and
tagging.

There's probably no stopping this shift to automation. Many scientific
publishers have "back end" service providers overseas in southeast Asia or
eastern Europe, but the publishers constantly struggle to contain costs. I
anticipate social unrest will spread as unemployment rises after publishers
shift their workflows to cut production expenses.

 If anyone knows of a business already shifting to using LLMs for XML markup,
it would be interesting to know the social impacts on their previous service
providers.

I have already played with using prompts and samples to get ChatGPT to
generate some schematron and XSpec unit tests. As Jon Udell notes in his post,
the breakdown of large transformations into smaller tasks seems to be the best
way to get good results, and QA is critical. But if newbies to LLMs can
produce results, the entire XML transformation space is going to be
revolutionized shortly.

By the way, millions of scientific articles that Jon mentions as sources for
extracting text from PDFs, are already tagged in JATS xml with full metadata
by the publishing platforms like Atypon and Siverchair, which then produce web
pages from the JATS. Those web pages are generally the paid subscriber content
of the publishers. (Many articles are also in the common license if required
by funding sources.) So it sounds like a bad practice to engineer a transform
for extracted text if an article might already be fully tagged in a
semantically rich content model like JATS that includes HTML table tagging.
https://jats.nlm.nih.gov/publishing/tag-library/1.0/n-pau2.html
Regards, Dorothy

> --------- Forwarded message ----------
From: Dave Pawson <dave.pawson@xxxxxxxxx<mailto:dave.pawson@xxxxxxxxx>>
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list@xxxxxxxxxxxxxxxxxxxxxx>

Date: Mon, 19 Jun 2023 07:38:31 +0100
Subject: Table transformation
Interesting use of LLM
https://blog.jonudell.net/2023/06/18/why-llm-assisted-table-transformation-is
-a-big-deal/

regards

--
Dave Pawson
XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/1090027> (by
email<>)

Current Thread