Re: [xsl] Dealing mixed content with invalid node-like text

Subject: Re: [xsl] Dealing mixed content with invalid node-like text
From: Karlmarx R <karlmarxr@xxxxxxxxx>
Date: Wed, 7 Dec 2011 06:42:03 +0800 (SGT)
Hello David,

Yes, I do process the content in 2 stages, preprocess into one
form of XML and then further process that to my final XML form. BUT, BOTH are
done in XSL with one signle file and the problem that I reported is in first
stage conversion itself. To make things even more clear, here is a rough
skeleton and explanation of my process.I get the entire content of the input
into a variable $input-text, and then tokenize it to get each line of data
into another variable, as below.

<xsl:variable name="lines"
select="tokenize($input-text, '\r?\n')"/>

<!--then pass it to another
template to process each line of data:-->
<xsl:call-template
name="process-lines">
                <xsl:with-param name="lines"
select="$lines"/>
</xsl:call-template>

<!-- And here, I  further process it
to select the REQUIRED value, -->
<xsl:template name="process-lines">
                                <xsl:param name="lines" as="xs:string*"/>
                                <xsl:for-each select="$lines">
                                                <xsl:variable
name="line-components" select="tokenize(.,'\t')"/>
                                                  <xsl:for-each
select="$line-components[position() = last()]">
                                                             <value>
                                                                        
<xsl:call-template name="tag-text">
                                                                             
         <xsl:with-param name="unparsed" select="."/>
                                                                         
</xsl:call-template>
                                                              </value>
                                                  </xsl:for-each>


<!-- AND
IT IS HERE in this "ag-text" template, I try to achieve  what I explained in
my original posting    --> 
 <xsl:template name="tag-text">
       <xsl:param
name="unparsed" required="yes"/>
         <xsl:analyze-string
select="$unparsed" regex="^(.*?)&lt;(.+)&gt;(.*)&lt;/(.+)&gt;(.*?)$">     
       etc as posted earlier. 

The skeleton input will be like (as I
mentioned before):

Line one text <b>within valid node</b> and like <II .>
Title etc
Line two with <1a .> Title etc, <i>within</i> <b>something</b> etc
another line can be just normal text
....

And it is vital here I get the data
in the way I wanted, so that out final output in stage two is correct. And
inview of this I cannot use <value-of select with d-o-e> here. As it seems
this cannot be acheived by XSL (looks likely) I am trying to get my source
corrected. But if there are solution available, in xsl or with better regex, I
would be happy to use. I hope the above clarifies your question. 

Thanks,
Karl


----- Original Message -----
From: David Carlisle <davidc@xxxxxxxxx>
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Re: Dealing mixed
content with invalid node-like text


> nd you can assume it as something like
a text file format

but your post said that you were using xsl:analyze-string,
which means that you must somehow be pre-processing your text format into XML
before it gets to XSLT as otherwise the input would not be well formed and
XSLT would not even start. We can't help with the XSLT question you asked
unless we know what the input looks like _to XSLT_.

David      

Current Thread