Subject: [xsl] Re: table transformation From: "Dorothy Hoskins dorothy.hoskins@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 21 Jun 2023 03:14:36 -0000 |
HI, as a person who has met and worked with people overseas who rekey PDFs and add metadata and XML tags to text extracted from them (or from Word) to create XML I am concerned about the ultimate effects of ChatGPT on the workers. In Chennai, doing the rekeying and markup was a lower middle class job with more prestige than working in a call center. The women and men I met were proud to have the technical computer skills and the knowledge of the XML DTDs. When the bulk of their work shifts to automated markup, the majority of them will not be needed for this work. What then will happen to this whole class of workers will ripple through the economies of countries that perform rekeying and tagging. There's probably no stopping this shift to automation. Many scientific publishers have "back end" service providers overseas in southeast Asia or eastern Europe, but the publishers constantly struggle to contain costs. I anticipate social unrest will spread as unemployment rises after publishers shift their workflows to cut production expenses. If anyone knows of a business already shifting to using LLMs for XML markup, it would be interesting to know the social impacts on their previous service providers. I have already played with using prompts and samples to get ChatGPT to generate some schematron and XSpec unit tests. As Jon Udell notes in his post, the breakdown of large transformations into smaller tasks seems to be the best way to get good results, and QA is critical. But if newbies to LLMs can produce results, the entire XML transformation space is going to be revolutionized shortly. By the way, millions of scientific articles that Jon mentions as sources for extracting text from PDFs, are already tagged in JATS xml with full metadata by the publishing platforms like Atypon and Siverchair, which then produce web pages from the JATS. Those web pages are generally the paid subscriber content of the publishers. (Many articles are also in the common license if required by funding sources.) So it sounds like a bad practice to engineer a transform for extracted text if an article might already be fully tagged in a semantically rich content model like JATS that includes HTML table tagging. https://jats.nlm.nih.gov/publishing/tag-library/1.0/n-pau2.html Regards, Dorothy > --------- Forwarded message ---------- From: Dave Pawson <dave.pawson@xxxxxxxxx> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Date: Mon, 19 Jun 2023 07:38:31 +0100 Subject: Table transformation Interesting use of LLM https://blog.jonudell.net/2023/06/18/why-llm-assisted-table-transformation-is-a-big-deal/ regards -- Dave Pawson
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Table transformation, Dave Pawson dave.paw | Thread | Re: [xsl] table transformation, Michael Kay michaelk |
Re: [xsl] Grouping elements that ha, Joel Kalvesmaki dire | Date | Re: [xsl] table transformation, Michael Kay michaelk |
Month |