Subject: Re: [xsl] Dealing mixed content with invalid node-like text From: Brandon Ibach <brandon.ibach@xxxxxxxxxxxxxxxxxxx> Date: Tue, 6 Dec 2011 19:22:08 -0500 |
If the text is "almost" XML, perhaps the easiest thing to do would be to fix it so it really is XML, then use a character map to output it as-is so your second pass can just parse it normally. If all you need to do is escape the angle-brackets in something like "<1a .>", your "tag-text" template could be as simple as: <xsl:value-of select="replace($unparsed, '<(\S+\s+\.)>', '&lt;$1&gt;')"/> And you would have declarations such as this at the top level: <xsl:output method="xml" version="1.0" encoding="utf-8" use-character-maps="xmlout"/> <xsl:character-map name="xmlout"> <xsl:output-character character="<" string="<"/> <xsl:output-character character=">" string=">"/> <xsl:output-character character="&" string="&"/> </xsl:character-map> If you have other content being produced in the first pass, whose correct output is threatened by this mapping, you may need to do some additional replacements in your "tag-text" template, substituting arbitrary characters (such as characters from the Unicode Private Use area) for less-than, greater-than and ampersand, then adjusting the character-map to map them back to their original forms. This sort of markup hacking is not a road I'd recommend going down, but if you have to do it, I can't really see a reason to do it in some other language, if XSLT is what you're comfortable with. Michael made a good point about using a proper parser (which I wouldn't implement in XSLT, as a first choice, even though it would be possible) if you can put together a proper grammar for your input, but if a few regex substitutions can get you safely to clean XML, the above approach may suffice. -Brandon :) On Tue, Dec 6, 2011 at 5:42 PM, Karlmarx R <karlmarxr@xxxxxxxxx> wrote: > Hello David, > > Yes, I do process the content in 2 stages, preprocess into one form of XML and then further process that to my final XML form. BUT, BOTH are done in XSL with one signle file and the problem that I reported is in first stage conversion itself. To make things even more clear, here is a rough skeleton and explanation of my process.I get the entire content of the input into a variable $input-text, and then tokenize it to get each line of data into another variable, as below. > > <xsl:variable name="lines" select="tokenize($input-text, '\r?\n')"/> > > <!--then pass it to another template to process each line of data:--> > <xsl:call-template name="process-lines"> > <xsl:with-param name="lines" select="$lines"/> > </xsl:call-template> > > <!-- And here, I further process it to select the REQUIRED value, --> > <xsl:template name="process-lines"> > <xsl:param name="lines" as="xs:string*"/> > > <xsl:for-each select="$lines"> > <xsl:variable name="line-components" select="tokenize(.,'\t')"/> > > <xsl:for-each select="$line-components[position() = last()]"> > <value> > <xsl:call-template name="tag-text"> > <xsl:with-param name="unparsed" select="."/> > </xsl:call-template> > </value> > </xsl:for-each> > > > <!-- AND IT IS HERE in this "ag-text" template, I try to achieve what I explained in my original posting --> > <xsl:template name="tag-text"> > <xsl:param name="unparsed" required="yes"/> > <xsl:analyze-string select="$unparsed" regex="^(.*?)<(.+)>(.*)</(.+)>(.*?)$"> > > etc as posted earlier. > > The skeleton input will be like (as I mentioned before): > > Line one text <b>within valid node</b> and like <II .> Title etc > Line two with <1a .> Title etc, <i>within</i> <b>something</b> etc > another line can be just normal text > .... > > And it is vital here I get the data in the way I wanted, so that out final output in stage two is correct. And inview of this I cannot use <value-of select with d-o-e> here. As it seems this cannot be acheived by XSL (looks likely) I am trying to get my source corrected. But if there are solution available, in xsl or with better regex, I would be happy to use. I hope the above clarifies your question. > > Thanks, > Karl
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Dealing mixed content wit, Karlmarx R | Thread | Re: [xsl] Dealing mixed content wit, David Carlisle |
Re: [xsl] Dealing mixed content wit, Karlmarx R | Date | Re: [xsl] Dealing mixed content wit, David Carlisle |
Month |