Subject: Re: [xsl] marking up text when term from other file is found From: Mukul Gandhi <gandhi.mukul@xxxxxxxxx> Date: Thu, 22 Apr 2010 11:51:10 +0530 |
I would try to solve this as, following: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="xml" indent="yes" /> <xsl:variable name="index-terms" select="document('indexTerms.xml')" /> <xsl:template match="node() | @*"> <xsl:copy> <xsl:apply-templates select="node() | @*" /> </xsl:copy> </xsl:template> <xsl:template match="text()" priority="10"> <xsl:analyze-string select="." regex="{string-join(for $term in $index-terms/terms/term return concat('(', $term, ')'), '|')}"> <xsl:matching-substring> <xsl:variable name="idVal" select="string-join(for $attrVal in $index-terms/terms/term[. = regex-group(0)]/@*[starts-with(name(),'index')] return $attrVal, '_')" /> <ph id="{$idVal}"> <xsl:value-of select="." /> </ph> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="." /> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template> </xsl:stylesheet> You may adapt this, to suit your requirements if needed. On Thu, Apr 22, 2010 at 8:38 AM, Hoskins & Gretton <hoskgret@xxxxxxxxxxxxxxxx> wrote: > > HI, I need help finding resources (examples and/or XSL) for this situation, > for which I haven't found quite the right recipe in my searches of the list > archives. > Given an XML file containing a list of terms and another file containing a > mix of elements containing text (narrative content, some inline markup for > emphasis and footnotes), I was asked if I could find occurrences of each > term wherever it appeared in the narrative content, and wrap each occurrence > with a tag. So my first thought is to load up each document into a variable. > But then I don't know what the most effective method of string comparison > would be, given that the narrative document might have the term's words with > different capitalization. If anyone can point me in the right direction, I'd > appreciate it. Also I would like to know if there is a practical limit to > how large a narrative file I can run with about 150 terms to find in the > B text. And if a different approach B would work better, such as writing Java > to do B brute force search and replace, please tell me so. (I work with a > Java programmer. Everything looks like a Java problem to her and an XSL > problem to me.) > -- Dorothy > Note: Using Saxon B 9.1.0.7. I just made up a set of terms and a bad > sentence as an example. > Example of terms (indexTerms.xml): > <?xml version="1.0" encoding="UTF-8"?> > <terms> > B <term index1="anxiety">Anxiety</term> > B <term index1="children">Children</term> > B <term index1="children" index2="illness">Children, illness</term> > B <term index1="children" index2="nightmare">Children, nightmare</term> > B <term index1="cure" index2="fever">Cure fever</term> > B <term index1="cure" index2="illness">Cure illness</term> > B <term index1="anxiety" index2="nightmare">Nightmare</term> > B <term index1="children" index2="illness">Sick children</term> > B <term index1="anxiety" index2="phobia">Worries, phobias and anxiety</term> > </terms> > > Example of narrative (sampleTopic.xml): > <?xml version='1.0' encoding='UTF-8'?> > <!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" > "http://docs.oasis-open.org/dita/v1.1/OS/dtd/topic.dtd"> > <topic id="sampleTopic"> > B <title>sampleTopic</title> > B <body> > B B <p>markup for sample terms testing a set of phrases to match to the > content of index terms:</p> > B B <p>Texttexttext text some of the terms are already in <ph> i.e. <ph > id="cure_fever">curing fever</ph>, <ph id="children_illness">sick > children</ph> and sometime the same terms occur, <i>but different case</i>, > not in a ph: Curing fever and <b>Sick children</b>. I need to get all the > occurrences of each of the term element strings marked up with <ph> > </p> > B </body> > </topic> > > Desired result: > <?xml version='1.0' encoding='UTF-8'?> > <!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" > "http://docs.oasis-open.org/dita/v1.1/OS/dtd/topic.dtd"> > <topic id="sampleTopic"> > B <title>sampleTopic</title> > B <body> > B B <p>markup for sample terms testing a set of phrases to match to the > content of index terms:</p> > B B <p>Texttexttext text some of the terms are already in <ph> i.e. <ph > id="cure_fever">curing fever</ph>, <ph id="children_illness">sick > children</ph> and sometime the same terms occur, <i>but different case</i>, > not in a ph: <ph id="cure_fever">Curing fever</ph> and <b><ph > id="children_illness">Sick children</ph></b>. I need to get all the > occurrences of each of the term element strings marked up with <ph> > </p> > B </body> > </topic> > > XSL: > <?xml version="1.0" encoding="UTF-8"?> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > version="2.0"> > <xsl:param name="indexFile">indexTerms.xml</xsl:param> > <xsl:param name="textFile">sampleTopic.xml</xsl:param> > <xsl:variable name="termsDocument" > select="document($indexFile)"></xsl:variable> > <xsl:variable name="textDocument" > select="document($textFile)"></xsl:variable> > <xsl:template match="*" name="test1"><xsl:result-document > href="matchText-test.xml" method="xml"> > <!-- proof that I can get the terms --> > <xsl:text> </xsl:text><xsl:comment><xsl:text>first term is > </xsl:text><xsl:value-of > select="$termsDocument/terms/term[1]"/></xsl:comment> > <xsl:text> </xsl:text><xsl:comment><xsl:text>second term is > </xsl:text><xsl:value-of > select="$termsDocument/terms/term[2]"/></xsl:comment> > <xsl:text> </xsl:text><xsl:comment><xsl:text>third term is > </xsl:text><xsl:value-of > select="$termsDocument/terms/term[3]"/></xsl:comment> > <!-- now how to I find them in the $textDocument elements and mark them up? > --> > </xsl:result-document> > </xsl:template> > </xsl:stylesheet> -- Regards, Mukul Gandhi
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] marking up text when term fro, Hoskins & Gretton | Thread | Re: [xsl] marking up text when term, Wolfgang Laun |
[xsl] +Hi+, Stylus Studio | Date | RE: [xsl] XSLT 1.0 : HTML table wit, Robby Pelssers |
Month |