Subject: RE: [xsl] Generating numeric character references From: "Andrew Welch" <AWelch@xxxxxxxxxxxxxxx> Date: Thu, 16 Jan 2003 09:44:37 -0000 |
I think the original poster had a problem of double escaping, such as & a m p ; # 1 7 3 ; in their source, and they simply wanted to convert this to the correct & # 1 7 3 ; Wouldn't running the source xml through an indentity transform would give the desired result, no need for string processing of any kind..... cheers andrew > -----Original Message----- > From: Wendell Piez [mailto:wapiez@xxxxxxxxxxxxxxxx] > Sent: 14 January 2003 21:55 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: Re: [xsl] Generating numeric character references > > > Stuart, > > The reason your task is proving difficult is that it's really > not what it > appears to be at first blush. There is a trap here, which you > can recognize > if you can clearly distinguish between XML-as-serialization > format, and the > XML document (a tree of nodes as described in the XPath spec) > that an XSLT > processor operates on. > > Numeric character references may appear in > XML-as-serialization; in the > XPath tree (the "document" built by the parser and handed to the XSLT > engine), however, these references never appear as such; > rather, each has > been converted into the character it represents. > > So, for example, if your data has character reference A, > your XSLT > processor sees this as an "A". (It may come out the back as > "A" if > your serialization encoding happens not to be able to do a > proper "A", but > internally it's an "A"). Therefore, what's required with > "&#x41;" isn't > to turn it into "A", but rather into "A". (Or, if you > get my drift: > you need to convert "&#x41;" into "A" *before* your > document is > parsed, or an "A" into an "A" *after* your document is parsed.) > > You are currently trying to do the latter; and it can be done > -- as you're > discovering -- with recursive processing over text nodes, > heuristics to > recognize target substrings, and a table to map them. But > it's not a job > that XSLT lends itself towards, since XSLT is as ungainly for > processing > strings as it is slick for processing nodes. Far preferable > would be to use > Perl or something else with good support for string-handling > and regular > expressions, to do the former task (munge the & entities > before parsing). > > Yet -- and this is where one has to be *very* cautious -- > XSLT does, at > least in certain circumstances (i.e. with certain processors > in certain > operational contexts) give you *some* control over how a > document, once > processed, is serialized -- and *if your data is clean* this optional > feature can be drafted into service to help with your > problem. What I'm > getting to, of course, is the dreaded disable-output-escaping.... > > That is, if your data is otherwise unproblematic, you can > achieve your goal > by running your document through a near-identity transform > that disables > output escaping on your text nodes. The document will emerge from the > transform unchanged (at least as XPath sees it) but with "&#x41" > represented as "A". This, *when parsed again*, will be > seen as the "A" > you really want. > > Note that this is not (if we're really strict with our terms) a > transformation in the XSLT sense. Rather, it's a tricky > application of the > serializer attached to most processors, will sometimes break > because it > disables escaping on the wrong characters (so if you have any > data such as > "if x < y", you're going to be in trouble unless you write > string-processing code to catch and work around it), and uses > an optional > feature of the language that restricts portability. > > Please consider this only a golden-hammer solution (i.e. > lacking a better > tool to do the job), and keep in mind it's easy to bang your > thumb this way > (since any anomalies in the input will make your output not > well-formed). > It is in view of these limitations that this really should be > done in a > separate pass, if with XSLT at all. > > Cheers, > Wendell > > At 03:05 PM 1/14/2003, you wrote: > >I'd like to transform specific text subtrings into numeric character > >references during in an XSLT transformation. For example, I want to > >transform all occurrences that look like "&#173;" within a string > >into "­". > > > >Here's the back story. I have source XML that is generated > automatically > >from HTML by a third-party. The third-party incorrectly > handles entity > >references, so that "­" in the original HTML in becomes > >"&#173;" in the XML. I want to restore the damage done. > To simplify > >things, I am only interested in documents with ISO 8859-1 encoding. > > > >Below is a solution [1] that I am not pleased with. It is a named > >template that recursively parses a string, trying to replace > references. > >This requires an <xsl:when> element for each value of > numeric character > >reference that might be encountered (see the "additional cases here" > >comment). Problems with this include linear search of values, omitted > >values, and opportunity for error in mismatched values. > > > >Can anyone suggest a better approach to generating numeric character > >references? I am would be fine restricting the solution to MSXML or > >.NET's System.Xml.Xsl XSLT processors, if that is an issue. > > > >Many thanks! > > > >Cheers, > >Stuart > > > > > > > >[1] A less than happy solution: > > > > <xsl:template name="restoreNumCharRefs"> > > <xsl:param name="string"/> > > > > <xsl:choose> > > <xsl:when test="contains($string, '&')"> > > <xsl:variable name="head" select="substring-before($string, > >'&')"/> > > <xsl:variable name="remainder" > select="substring-after($string, > >'&')"/> > > <xsl:variable name="reference" > >select="substring-before($remainder, ';')"/> > > > > <xsl:variable name="entity"> > > <xsl:choose> > > <xsl:when test="$reference='#167'">§</xsl:when> > > <xsl:when test="$reference='#173'">­</xsl:when> > > > > <!-- additional cases here --> > > > > <xsl:otherwise>&<xsl:value-of > >select="$reference"/>;</xsl:otherwise> > > </xsl:choose> > > </xsl:variable> > > > > <xsl:variable name="tail"> > > <xsl:call-template name=" restoreNumCharRefs"> > > <xsl:with-param name="string" > >select="substring-after($remainder, ';')"/> > > </xsl:call-template> > > </xsl:variable> > > > > <xsl:value-of select="concat($head, $entity, $tail)"/> > > </xsl:when> > > <xsl:otherwise> > > <xsl:value-of select="$string"/> > > </xsl:otherwise> > > </xsl:choose> > > > > </xsl:template> > > > > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list ====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ====================================================================== XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.441 / Virus Database: 247 - Release Date: 09/01/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.441 / Virus Database: 247 - Release Date: 09/01/2003 XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Generating numeric charac, Stuart Celarier | Thread | RE: [xsl] Generating numeric charac, Yates, Danny (ANTS) |
RE: [xsl] Generating variable DOCTY, Cams Ismael | Date | Re: [xsl] DTD copy, xxx |
Month |