Subject: RE: [xsl] Transform HTML to XML using XSL From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Fri, 19 Jan 2007 11:41:25 -0000 |
You first step is to run the input through the tidy utility to turn it into well-formed XML (preferably XHTML). At the moment it won't parse as XML because of all the entities. Or you could use the TagSoup parser to provide the input to the transformation, which will do this conversion for you "inline". By looking at your input and output I can discern a few rules, which one can write as template rules, for example <xsl:template match="FONT[@size='4']"> <title><xsl:apply-templates/></title> </xsl:template> But in fact, one of your desired titles has <label> and <b> elements within it, and I've no idea what it is in the input that causes these to be generated. So I would suggest you proceed iteratively, adding rules like the above incrementally to get closer to the output that's needed. If you're converting a whole batch of documents, you should check the rules work on a reasonable sample of them. Every time you don't get quite the output you want, see what clues there are in the input to enable you to refine the rules. You'll want to start with a stylesheet that copies things unconditionally: <xsl:template match="*"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates/> </xsl:template> Then add rules for specific elements, or element patterns, that vary the processing for those elements. It might be that you hit some structural issues, for example where you want to create an output element that corresponds to a consecutive sequence of input elements. This "positional grouping problem" often arises in up-conversion exercises like this one. There are well-known solutions - and it's much easier in XSLT 2.0. When you get to that point, come back to the list and identify the specific problem that's blocking you, taking care to separate it from all the noise that surrounds it. I know communication in a foreign language can be difficult, but asking "How start and close in chapter and section tag" isn't going to get an answer. Starting and closing tags is what you do all the time in XSLT (though that's not actually the correct terminology). We need to know what the particular problem is in this case. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Byomokesh [mailto:bkesh@xxxxxxxxxxxxxxx] > Sent: 19 January 2007 07:16 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] Transform HTML to XML using XSL > > Hi All, > > HTML File > ========= > <HTML> > <BODY> > <P align="center"><FONT face="Arial" size="2"><FONT > size="4">Prólogo<BR/></FONT><FONT size="5" > color="#FF0000"><I>Comiença Cathólica Magestad > delinvictíssimo</I> semper</FONT> Emperador de > Roma.</FONT></P> <P align="center"><FONT > size="4">Argumento<BR/></FONT><FONT size="5" > color="#FF0000"><I>Síguese el Argumento > del</I></FONT></P> <P align="center"><FONT > size="4">Capítulo I<BR/></FONT><FONT size="5" > color="#FF0000"><I>Marco Aurelio Emperador.</I></FONT></P> > </BODY> </HTML> > > I Want Output > ============= > <document> > <chapter id="FM01"><title>Front Matter</title> <level > id="pref01"><title>Prólogo</title> > <para><i>Comiença Cathólica Magestad > delinvictíssimo</i> semper Emperador de Roma.</para> > <-- Some para continue --> > </level> > <level id="pref02"><title>Argumento</title> > <para>Síguese el Argumento del</para> </level> > </chapter> <chapter > id="Ch01"><title><label><b>Capítulo I</b></label> > <b>Marco Aurelio Emperador.</b></title> > <!-- then para continue and again chapter start --> > </chapter> </document> > > --------------------- > > Same tag but i need different condition to xml output. > > 1. How start and close in chapter and section tag. > 2. <BR/> -- tag some cases inline text and some cases need > para taging in base of XML output. > > Please anyone help.... > > Thanks and Regards > Byomokesh
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Transform HTML to XML usi, Eric Bréchemier | Thread | [xsl] Altsoft Xml2PDF Server 2007 b, Stanislav Sobolevsky |
Re: [xsl] Pivot Reports, Vitaliy Paykov | Date | Re: [xsl] Pivot Reports, Abel Braaksma |
Month |