Subject: [xsl] Re: cleanup of <div>-elements From: "Piez, Wendell A. (Fed) wendell.piez@xxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Mon, 27 Feb 2023 17:12:57 -0000 |
Hi Monika, The reason Chris asks his question is that this will impact how good your solution can be. In XSLT it is often easy to implement if it is easy to define. The question here is whether you can easily and deterministically distinguish between a div element that should become a p, and one that should stay a div. Answer that question and the code is straightforward. A rule to do this might be something like "any div that has a child `sub`, `a` or untagged text becomes a p, while any other div (containing only the blocks) stays a div". But how well this works depends on your case. One reason we use schemas to validate! Regards, Wendell From: Chris Papademetrious christopher.papademetrious@xxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Sent: Monday, February 27, 2023 11:40 AM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: [xsl] Re: cleanup of <div>-elements Hi Monika, Will the content between headings always be limited to known "block-level" element types (p, ol, ul, etc.)? * Chris From: Madlik, Monika (LNG-VIE) monika.madlik@xxxxxxxxxxxxx<mailto:monika.madlik@xxxxxxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list-service@xxxxxxxxxxxx rytech.com>> Sent: Monday, February 27, 2023 11:31 AM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Subject: [xsl] cleanup of <div>-elements Hi, I have a problem with an XML-file that has to be converted. I get XML-files that are semi-structured. So I have the h1/h2-information in it and also tables, lists, ... Paragraphs are tagged with <p> - but not always. Sometimes <p> is missing and instead of it a weird construct of <div>-elements is tagged around texts and other elements. Is there a possibility to unravel this div-constructs without loosing texts and structure? I need to have the element <p> around texts and markup for i.e. strong text or italic text, ... My problem is, that the div-elements could appear in any form and any depth and it's also possible that many div-elements are wrapped around other div-elements. Example-XML: <root> <h1>...</h1> <p>...</p> <ul> <li>...</li> <li>...</li> </ul> <div> <h1>...</h1> <h2>...</h2> <p>...</p> <h2>...</h2> <p>...</p> <h1>...</h1> <p>...</p> <h2>...</h2> <p>...</p> <div> <h1>...</h1> <div>...<sup><a href="#footnote-9" id="9" rel="footnote">[9]</a></sup></div> </div> <div> <br/> ... <strong>...</strong> ...<sup><a href="#footnote-10" id="10" rel="footnote">[10]</a></sup> <div> <h1>...</h1> </div> </div> <p>...</p> </div> </root> The yellow marked text should look like this after my transformation: <h1>...</h1> <p>...<sup><a href="#footnote-9" id="9" rel="footnote">[9]</a></sup></p> <p><br/> ... <strong>...</strong> ...<sup><a href="#footnote-10" id="10" rel="footnote">[10]</a></sup></p> <h1>...</h1> Thanks a lot, Monika XSL-List info and archive<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Furl defense.com%2Fv3%2F__http%3A%2Fwww.mulberrytech.com%2Fxsl%2Fxsl-list__%3B!!A4 F2R9G_pg!f1gr0_ZMDVVg5f0HueHWUmEtGAy0Ib1jVOTHPev3cS_JRsYAj2KVqqaBRy6TcodgbJbS sUHr5NtB3jPhiTW1C69-eZ_Z3clJqtBFEHmwEpy76u2UInUL%24&data=05%7C01%7Cwendell.pi ez%40nist.gov%7C43fe0bf1c15948004eb108db18e140a4%7C2ab5d82fd8fa4797a93e054655 c61dec%7C1%7C0%7C638131127969071944%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nUhiF Er5GG5TMn2xYUdCv1de7%2BicHLH3UA6qVmogokc%3D&reserved=0> EasyUnsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=https%3A% 2F%2Furldefense.com%2Fv3%2F__http%3A%2Flists.mulberrytech.com%2Funsub%2Fxsl-l ist%2F3380743__%3B!!A4F2R9G_pg!f1gr0_ZMDVVg5f0HueHWUmEtGAy0Ib1jVOTHPev3cS_JRs YAj2KVqqaBRy6TcodgbJbSsUHr5NtB3jPhiTW1C69-eZ_Z3clJqtBFEHmwEpy76gLiPaVt%24&dat a=05%7C01%7Cwendell.piez%40nist.gov%7C43fe0bf1c15948004eb108db18e140a4%7C2ab5 d82fd8fa4797a93e054655c61dec%7C1%7C0%7C638131127969071944%7CUnknown%7CTWFpbGZ sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300 0%7C%7C%7C&sdata=WeT6q7In0Gky3%2BrB9ve5DMaRmzLgCqUoFH5XO6isNa0%3D&reserved=0> (by email) XSL-List info and archive<https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww. mulberrytech.com%2Fxsl%2Fxsl-list&data=05%7C01%7Cwendell.piez%40nist.gov%7C43 fe0bf1c15948004eb108db18e140a4%7C2ab5d82fd8fa4797a93e054655c61dec%7C1%7C0%7C6 38131127969071944%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=20MBrVbDg4kjZbpxku8Y7TC IrQuDn9yx2vFvJWcr9ko%3D&reserved=0> EasyUnsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2 F%2Flists.mulberrytech.com%2Funsub%2Fxsl-list%2F3302254&data=05%7C01%7Cwendel l.piez%40nist.gov%7C43fe0bf1c15948004eb108db18e140a4%7C2ab5d82fd8fa4797a93e05 4655c61dec%7C1%7C0%7C638131127969071944%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7 23ukTrFj8PeZE7OeUvh7HroyORqcNPGk3dvPKB0GVo%3D&reserved=0> (by email<>)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Re: cleanup of <div>-elements, Chris Papademetrious | Thread | Re: [xsl] cleanup of <div>-elements, Michael Kay michaelk |
[xsl] Re: cleanup of <div>-elements, Chris Papademetrious | Date | Re: [xsl] cleanup of <div>-elements, Michael Kay michaelk |
Month |