RE: [xsl] Converting a Batch File to XML

Subject: RE: [xsl] Converting a Batch File to XML
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Sun, 1 Aug 2004 19:48:12 +0100
This example works fine for me after changing the regex to
regex="[\-a-zA-Z0-9]+"

It gives output starting:

<?xml version="1.0" encoding="utf-8"?>
<someRoot>
   <record>
      <word>H-A-HEADER</word>
      <other> </other>
      <word>some</word>
      <other> </other>
      <word>content</word>
   </record>
   <record>
      <word>I-AN-ITEM-1</word>
      <other> </other>
      <word>more</word>
      <other> </other>
      <word>content</word>
   </record>

I wonder if there's a Java problem? I ran this using Java version 1.4.1_02.

Michael Kay

 

> -----Original Message-----
> From: David.Pawson@xxxxxxxxxxx [mailto:David.Pawson@xxxxxxxxxxx] 
> Sent: 28 July 2004 14:41
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: RE: [xsl] Converting a Batch File to XML 
> 
>  
> 
>     -----Original Message-----
>     From: Michael Kay 
> 
>     This kind of thing is very much easier using XSLT 2.0
>     
>     * use the unparsed-text() function to read the text file
>     
>     * split it into individual lines using the tokenize() function
>     
>     * parse each line using xsl:analyze-string
>     
>     * arrange it into a hierarchical structure using 
> xsl:for-each-group
> 
> Incomplete structure, and I couldn't get saxon to escape the hyphen 
> in a character class, but it may be of help.
> 
> input file
> H-A-HEADER some content
> I-AN-ITEM-1 more content
> I-AN-ITEM-2  and again
> S-A-SUMMARY-1 for variety
> I-AN-ITEM-3  and change
> S-A-SUMMARY-2 and different again
> 
> Stylesheet
> 
> <?xml version="1.0" encoding="utf-8"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>                 version="2.0">
> 
>   <xsl:output method="xml" indent="yes" encoding="utf-8"/>
> 
>   <xsl:template match="/">
>     <xsl:variable name="f" 
> select="unparsed-text('unparsedEntity.txt','utf-8')"/>
> 
> 
>     <someRoot>
>       <xsl:for-each select='tokenize($f, "\n")'>
>     <record> 
>       <xsl:analyze-string  regex="[a-zA-Z0-9]+" select=".">
>     <xsl:matching-substring>
>       <word><xsl:value-of select="."/></word>
>     </xsl:matching-substring>
>     <xsl:non-matching-substring>
>       <other>
>         <xsl:value-of select="."/>
>       </other>
>     </xsl:non-matching-substring>
>   </xsl:analyze-string>
>     </record>
>   </xsl:for-each>
> </someRoot>
>   </xsl:template>
> </xsl:stylesheet>
> 
> 
> regex="[\-a-zA-Z0-9]+"
> failed to select any matches?
> http://www.w3.org/TR/xmlschema-2/#regexs
> seems to make it valid?
> 
> 
> HTH DaveP
> 
> ** snip here **
> 
> -- 
> DISCLAIMER: 
> 
> NOTICE: The information contained in this email and any 
> attachments is 
> confidential and may be privileged. If you are not the intended 
> recipient you should not use, disclose, distribute or copy any of the 
> content of it or of any attachment; you are requested to notify the 
> sender immediately of your receipt of the email and then to delete it 
> and any attachments from your system. 
> 
> RNIB endeavours to ensure that emails and any attachments 
> generated by 
> its staff are free from viruses or other contaminants. However, it 
> cannot accept any responsibility for any  such which are transmitted.
> We therefore recommend you scan all attachments. 
> 
> Please note that the statements and views expressed in this email and 
> any attachments are those of the author and do not 
> necessarily represent 
> those of RNIB. 
> 
> RNIB Registered Charity Number: 226227 
> 
> Website: http://www.rnib.org.uk 

Current Thread