Subject: Re: [xsl] finding and removing duplicate string From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx> Date: Fri, 2 Dec 2011 18:22:02 +0100 |
Unless your <p>-paragraphs aren't very long you should not use pattern matching like this because this is a pattern that exhibits quadratic performance depending on the string length. I ran a quick test comparing Java's regex engine to the substring comparison proposed here earlier on. The "hit" case (2 x "the quick brown..."): pattern: 0.000003061s - substr: 0.000000134s, a factor of 22 The "fail" case ("the quick brown..." vs "okkokoko...", equal lengths) pattern: 0.000004452s - substr: 0.000000026s, a factor of 171 Some XSLT regex engine might be better, but its execution time is still bound to increase by O(n^2). -W On 2 December 2011 17:29, Imsieke, Gerrit, le-tex <gerrit.imsieke@xxxxxxxxx> wrote: > <xsl:template match="p"> > <xsl:copy> > <xsl:copy-of select="@*" /> > <!-- use replace() for normalizing the input first, i.e., replace the > newline with a space: --> > <xsl:analyze-string select="replace(., '\s+', ' ')" > regex="^(.+)\s+\1$"> > <!-- \1 is a back-reference to the first match, which is allowed according > to http://www.w3.org/TR/xpath-functions/#regex-syntax --> > <xsl:matching-substring> > <xsl:value-of select="regex-group(1)"/> > </xsl:matching-substring> > <xsl:non-matching-substring> > <!-- output the whole string if above regex doesn't match: --> > <xsl:value-of select="."/> > </xsl:non-matching-substring> > </xsl:analyze-string> > </xsl:copy> > </xsl:template> > > > On 2011-12-02 16:32, Jacob L wrote: >> >> All, >> >> >> I am using<xsl:stylesheet version="2.0" .If in the input XML file, >> the text in the<p> tag repeats itself such as >> >> >> >> <text> >> >> <p>Bradley Cooper named Peoples Sexiest man alive 2011 Bradley >> Cooper named Peoples Sexiest man alive 2011</p> >> >> </text> >> >> >> >> I want to write code to check it and omit it. The result should be:- >> >> >> >> After putting check in the xsl and deleting the duplicate string. The >> output should be:- >> >> >> >> <text> >> <p>Bradley Cooper named Peoples Sexiest man alive 2011</p> >> </text> >> >> >> Thanks for the help! >> > > -- > Gerrit Imsieke > Geschdftsf|hrer / Managing Director > le-tex publishing services GmbH > Weissenfelser Str. 84, 04229 Leipzig, Germany > Phone +49 341 355356 110, Fax +49 341 355356 510 > gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de > > Registergericht / Commercial Register: Amtsgericht Leipzig > Registernummer / Registration Number: HRB 24930 > > Geschdftsf|hrer: Gerrit Imsieke, Svea Jelonek, > Thomas Schmidt, Dr. Reinhard Vvckler
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] finding and removing dupl, Imsieke, Gerrit, le- | Thread | Re: [xsl] finding and removing dupl, Andrew Welch |
Re: [xsl] __LINE__ equivalent in XS, Bartolomeo Nicolotti | Date | Re: [xsl] finding and removing dupl, Andrew Welch |
Month |