Subject: [xsl] Finding "unknown" character references From: "Huditsch Roman" <Roman.Huditsch@xxxxxxxxxxxxx> Date: Tue, 22 Nov 2005 13:53:21 +0100 |
Hi, I need to check for "unknown" character references within my XML files. All valid character references are stored in another XML file ("invalid_chars.xml") : <?xml version="1.0" encoding="ISO-8859-1"?> <chars> <char>„</char> <char>‚</char> <char>š</char> ... </chars> Everytime a special character, which is unknown to this reference file, is encountered, it should be outputted. What I've come up with is: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" encoding="ISO-8859-1" indent="yes"/> <xsl:template name="init"> <xsl:variable name="source" select="'../dirlist.txt'"/> <xsl:variable name="encoding" select="'iso-8859-1'"/> <xsl:variable name="src"> <errors> <xsl:for-each select="tokenize(unparsed-text($source, $encoding), '\r?\n')"> <file> <xsl:value-of select="."/> </file> </xsl:for-each> </errors> </xsl:variable> <xsl:result-document href="invalid_chars.xml"> <errors> <xsl:apply-templates select="$src//file[text()]"/> </errors> </xsl:result-document> </xsl:template> <xsl:template match="file"> <xsl:analyze-string select="." regex="^([0-9]{{2}}.[0-9]{{2}}.[0-9]{{4}})\p{{Zs}}+([0-9]{{2}}:[0-9]{{2} })\p{{Zs}}+([0-9.]+)(.*)$" flags="ix"> <xsl:matching-substring> <xsl:variable name="docpath" select="normalize-space(regex-group(4))"/> <xsl:variable name="errors" as="element()*"> <!-- Here my problems begin --> <xsl:if test="document($docpath)//text()[contains(., '&')]"> <xsl:analyze-string select="." regex="&[#a-zA-Z0-9]+" flags="i"> <xsl:matching-substring> <xsl:if test="document('valid_cahrs.xml')//char=concat(regex-group(0), ';')"> <file> <char> <xsl:value-of select="."/> <xsl:text>;</xsl:text> </char> </file> </xsl:if> </xsl:matching-substring> </xsl:analyze-string> </xsl:if> </xsl:variable> <xsl:for-each-group select="$errors" group-by="@name"> <file name="{current-grouping-key()}"> <xsl:for-each select="current-group()"> <xsl:copy-of select="*"/> </xsl:for-each> </file> </xsl:for-each-group> </xsl:matching-substring> <xsl:non-matching-substring/> </xsl:analyze-string> </xsl:template> </xsl:stylesheet> Example XML: <?xml version="1.0" encoding="ISO-8859-1"?> <root> <absatz> „dyplom licencjata pielęgniarstwa“ </absatz> </root> How can I check, if a text node contains a character unknown to my valid_chars.xml? Do you have any ideas, how I can make my XSLT work? Thank you very much. wbr, Roman
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] FW: Stopping recursion, Mark Seaborne | Thread | Re: [xsl] Finding "unknown" charact, Dimitre Novatchev |
Re: [xsl] FW: Stopping recursion, Mark Seaborne | Date | [xsl] grouping and merging problem, Sylvain Rouillard |
Month |