And to the extent there are pressures to modernize and upgrade systems,
teams would rather move away from XML/XSLT altogether, if only because it
terrifies them. (Maybe I exaggerate or maybe I don't.)
This faces us with the paradox of no one trying XSLT in new environments
and architectures because no one is trying XSLT in new environments and
architectures.
Oh - I should qualify - *in public* - we don't know what people are doing
who are not talking about it.
The biggest benefit of XProc 3.0 in my view is that it promises
sustainability (assuming we do our work) beyond the sustainability of a
particular toolchain.
But even bigger than 'the biggest benefit' at this moment, it is also
possible to build and deploy XDM-based processes (XProc with embedded
XSLT/XQuery as needed) that are deterministic, verifiable, and testable.
Rigorous testability is not a requirement for every system at every level.
But if some kinds of systems require rigorous testability, the technology
as a whole needs to be able to support it.
The accessibility (wrt to both openness and sustainability) and testability
of XProc and XSLT stand in marked contrast to the kinds of black box
processes that are now being entrusted these days with various kinds of
vital and not-so-vital operations.
Yet at the same time, outdated information and myths persist and even
late-generation XSLT, XQuery and XProc are regarded as not worth the
trouble, while developers think about which hot new technology they should
be looking at.
It seems to me there are opportunities here for those bold enough to bear
down against the grain.
https://github.com/usnistgov/oscal-xproc3.
Regards, Wendell
On Sat, Jan 18, 2025 at 8:26b/PM dvint dvint@xxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> I hadn't but part of my problem is the team is not xml aligned any more. I
> was trying to avoid xslt by using Python when that seemed to fail me.
>
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
>
> -------- Original message --------
> From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
> Date: 1/18/25 2:44 PM (GMT-08:00)
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] Running XSLT from Python
>
> Dan,
>
> Have you considered XProc 3.0? It is able to read HTML the same as it does
> XML. While bad inputs are bad inputs, it is also good for detecting and/or
> repairing them. It can embed and use Schematron and XSLT; you might also
> find many of the things you need to do are achievable by XProc alone.
>
> Two XProc implementations are now available, Morgana XProc III, and XML
> Calabash 3.0.
>
> More references can be provided --
>
> Regards, Wendell
>
>
>
> On Fri, Jan 17, 2025 at 6:16b/PM dvint@xxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>> First off, is anyone aware of a good way to merge a bunch of HTML
>> techdoc pages into a single HTML so a PDF file can be generated with
>> something like Prince or Weasyprint? I didn't find anything so I went
>> down this the following path.
>>
>> For this effort I decided to see what coPilot would come up with for
>> this task. It has been an interesting experiment for the proof of
>> concept effort but now I need to get this production ready. I was also
>> initially trying to avoid using XSLT as I'm the only one on the team
>> that likes XLST and I was processing HTML that isn't well-formed.
>>
>> CoPilot created some Python using BeautifulSoup initially. My forst
>> discovery is that Beautiful soup seems to be good for extracting content
>> from the HTML, but I haven't found a way to process it like XSLT - maybe
>> my mind has been warpped by XSL and tools like Omnimark and I just don't
>> see the path. Anyway after trying to do the job with BeautifulSoup, I
>> started looking for a way to integrate XSLT and coPilot took me to
>> lxml/etree.
>>
>> With etree I was able to start developing the core part of the
>> processing. Here is the flow of the geenral program:
>> 1) Extract the navigation/TOC from one of the HTML files. I did this
>> with BeautifulSoup because the HTML is not well-formed and I just needed
>> to extract a single element.
>> 2) I processed all the HTML and made a new copy in a subfolder. Using
>> BeautifulSoup again, I extracted the body of the HTML pages. The body
>> content is well-formed, the head content isn't.
>> 3) Using the extracted TOC/navigation from step 1 to drive the
>> processing, I created an XSLT that took that information and then
>> started processing the extracted content. I've been able to get a single
>> HTML file with all of the content. I haed to create unique IDs for all
>> the sections and modify the cross references to change them from file
>> references to links to anchors in the new file.
>>
>> All of that is working great until there are errors in the HTML. This
>> HTML is generated with asciidoc. Occasionally, a writer will put quotes
>> in an alt text for an image. This results in mangled image references
>> that doesn't affect the visual rendering of the HTML, but XSLT trips up
>> on this. Other bad asciidoc has created some other other mangled HTML
>> which again isn't reported and doesn't affect the visual result. When
>> the XSLT hits this I get reasoanble error messages that tell me what the
>> problem is when I run in oXygen. I will get a message from Python that
>> just tells me it failed with the filename.
>>
>> Can you confirm my understanding and that there isn't a way to get the
>> XSLT error and xsl:message strings I've created? Maybe Saxon in oXygen
>> is providing better information than lxml can?
>>
>> I'm looking into switching to Saxon HE to see if that helps.
>>
>> ..dan
>>
>>
>>
>
> --
> ...Wendell Piez... ...wendell -at- nist -dot- gov...
> ...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
> ...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/1240222> (by
> email)
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/174322> (by
> email <>)
>
--
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...