Subject: Re: Normalizing string containing entities From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx> Date: Tue, 18 Jul 2000 10:29:13 +0100 |
Pierre-Yves, I'm actually thinking things are easier if you leave the mixed content alone. That is, normalize whitespace in it, but don't wrap it in anything. FWIW, I disagree with the notion that mixed content makes life harder. Wrapping up the text nodes doesn't help with this problem -- actually, the fact that mixed content is what distinguishes your element nodes for normalizing, is where the solution is... Why not strip it all, then put it back where you want it? So.... <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="text()"> <!-- strip extra whitespace from text nodes (including leading and trailing whitespace) --> <xsl:value-of select="normalize-space(.)"/> </xsl:template> <xsl:template match="*"> <!-- default element rule is identity transform --> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates/> </xsl:copy> </xsl:template> <xsl:template match="*[../text()[normalize-space(.) != '']]"> <!-- but this template matches any element appearing in mixed content --> <xsl:variable name="textbefore" select="preceding-sibling::node()[1][self::text()]"/> <xsl:variable name="textafter" select="following-sibling::node()[1][self::text()]"/> <!-- Either of the preceding variables will be an empty node set if the neighbor node is not text(), right? --> <xsl:variable name="prevchar" select="substring($textbefore, string-length($textbefore))"/> <xsl:variable name="nextchar" select="substring($textafter, 1, 1)"/> <!-- Now the action: --> <xsl:if test="$prevchar != normalize-space($prevchar)"> <!-- If the original text had a space before, add one back --> <xsl:text> </xsl:text> </xsl:if> <xsl:copy> <!-- Copy the element over --> <xsl:copy-of select="@*"/> <xsl:apply-templates/> </xsl:copy> <xsl:if test="$nextchar != normalize-space($nextchar)"> <!-- If the original text had a space after, add one back --> <xsl:text> </xsl:text> </xsl:if> </xsl:template> </xsl:stylesheet> Using David's test: <x> <para>Some text <em>some other text</em> remaining text</para> <para>Some text<em> some other text</em> remaining text</para> <para>Some text <em> some other text</em> remaining text</para> <para>Some text <em>some other text </em> remaining text</para> <para>Some text <em>some other text </em>remaining text</para> <para> Some text <em>some other text</em> remaining text </para> </x> We get output (using Saxon) <x> <para>Some text <em>some other text</em> remaining text</para> <para>Some text<em>some other text</em> remaining text</para> <para>Some text <em>some other text</em> remaining text</para> <para>Some text <em>some other text</em> remaining text</para> <para>Some text <em>some other text</em>remaining text</para> <para>Some text <em>some other text</em> remaining text</para> </x> If you wanted to get space before the <em> element in the second case or after in the fifth case, the logic could be extended to catch them (left as an exercise :-). Good luck! Wendell At 10:20 AM 7/14/00 -0500, Imran wrote: >> Consider, for example, the following: >> >> <para>Some text <em>some other text</em> remaining text</para> > ><snip/> > >> The answer I found in several books is that we should not have elements >> mixing CDATA and subelements. If we apply this rule, it is impossible to >> represent the real structure of text. > >not entirely true. There's nothing preventing you from marking up the plain >text as "plain" the same way you mark-up the emphasized text as "em". > >eg, do this instead: > ><para> > <plain>Some text </plain> > <em>some other text</em> > <plain> remaining text</plain> ></para> > > >the drawbacks to this solution are that the structure can seem more >complicated, and it will use more memory (I think -- i'm no expert on >that...). in addition, often-times you dont' even have control over the >original structure, so you have to use somebody else's model which mixes >content. > but if you can set up your structure this way, it makes XSLT processing >much easier, as well as processing for other XML apps. > >(this doesn't actually help w/ your initial problem, though, b/c there's >still the matter of stripping the whitespace in the middle of text nodes...) > >Imran > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > > ====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ====================================================================== XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: Normalizing string containing e, Pierre-Yves Saumont | Thread | RE: Normalizing string containing e, Pierre-Yves Saumont |
RE: problem generating dynamic name, Kay Michael | Date | Re: problem generating dynamic name, David Carlisle |
Month |