RE: [xsl] XSL pattern needed for begin/end elements

Subject: RE: [xsl] XSL pattern needed for begin/end elements
From: Pieter Reint Siegers Kort <pieter.siegers@xxxxxxxxxxx>
Date: Wed, 7 Jul 2004 16:42:40 -0500
Hi tracy,

I haven't tried it with variations in your input XML but you may want to use
a identity template approach, like this:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:xlink="http://www.w3.org/1999/xlink";
exclude-result-prefixes="xlink"
version="1.0">

<xsl:output indent="yes"/>

<xsl:template match="/doc">
   <xsl:copy>
     <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="@*|node()">
   <xsl:copy>
     <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="text_run"/>
<xsl:template match="hyperlink_end"/>

<xsl:template match="hyperlink_begin">
  <hyperlink>
    <xsl:attribute name="xlink:href">
      <xsl:value-of
select="concat(locator_url/@protocol,'://',locator_url/@host_name)"/>
    </xsl:attribute>
    <xsl:value-of select="concat(following-sibling::text_run,'
')"/><b><xsl:value-of select="following-sibling::text_run[2]"/></b>
  </hyperlink>
</xsl:template>

</xsl:stylesheet>

When applied to the input XML:

<doc>
  <hyperlink_begin id="111" end="222">
    <locator_url protocol="http" host_name="www.sf.net"/>
  </hyperlink_begin>
  <text_run>Click</text_run>
  <text_run emphasis="bold">here.</text_run>
  <hyperlink_end id="222" begin="111"/>
</doc> 

this produces

<?xml version="1.0" encoding="UTF-16"?>
<doc>
<hyperlink xlink:href="http://www.sf.net";
xmlns:xlink="http://www.w3.org/1999/xlink";>Click <b>here.</b>
</hyperlink>
</doc>

The only problem I still see other than the input variations, is that the
namespace is still in the output element <hyperlink>; I tried to get rid of
it using exclude-result-prefixes="xlink", but that didn't help. Maybe
someone else could comment on that one?

Anyway, I hope this helps you in some way - if not, I apologize, but it has
been anyway a good exercise for me to try and solve :-)

Cheers,
<prs/>

-----Original Message-----
From: Tracy Atteberry [mailto:Tracy.Atteberry@xxxxxxxxxxxx] 
Sent: Wednesday, July 07, 2004 3:40 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: RE: [xsl] XSL pattern needed for begin/end elements

Mike,

Thanks for your suggestion.  Your template code is much cleaner than what I
had posted (so I used it as an example to clean up my own!) but
unfortunately the behavior remains the same.  That is, the text_run elements
between the hyperlink_(begin/end) elements are processed twice.
Once for the hyperlink then again.

So the output looks something like this:

<cod>
 <HyperLink xlink:href="http://www.sf.net";>
   Click <b>here.</b>
 </HyperLink>
 Click <b>here.</b>
</cod>

How do we stop the intervening elements from being processed twice?

Thanks,
-Tracy

-----Original Message-----
From: Mike Trotman [mailto:mike.trotman@xxxxxxxxxxxxx]
Sent: Wednesday, July 07, 2004 3:09 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] XSL pattern needed for begin/end elements


Tracy.

I haven't worked through this too carefully - but here is a pseudo-code 
method that might work in the sibling case.
It is based on the idea of selecting all following nodes for processing 
based on their next <hyperlink_end> element having matching attributes 
to the current hyperlink-begin.
(which looks like what you were intending)

<xsl:template match='hyperlink-begin'>
 <HyperLink 
xlink:href="{concat(locator_url/@protocol,'/',locator_url/@host_name)}">
<xsl:apply-templates 
select='following-sibling::*[following-sibling::hyperlink_end[1][@id=cur
rent()/@end]' 
mode='INLINK'/>
 </HyperLink>
</xsl:template>

<xsl:template match='text_run' mode='INLINK'>
<xsl:choose>
<xsl:when test='@emphaisis="bold"'>
<b><xsl:value-of select='.'/></b>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select='.'/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

There are other ways of doing the 'INLINK' mode processing - depending 
on hwo complex it gets.
E.g. - you could have separate templates matching 'text_run[@emphasis]'
etc.

You may need an additional template
e.g.
<xsl:template match='*' mode='INLINE'>
    <xsl:apply-templates/> <!-- or whatever else you want to do -->
</xsl:template> if you need to process non-sibling intervening elements.

I think something close to the above should work.

HTH.


Tracy Atteberry wrote:

>Mike,
>
>The current project is a demo for something that will eventually be 
>written in C/C++.  Then as you say, we can then walk the DOM tree and 
>maintain a separate context stack to help solve the problem.
>
>For now, we can definitely assume that these elements are siblings.  In

>fact, for most real source documents this will be the case.  Given that

>assumption, I would love to know the not-too-difficult solution, as 
>this is my immediate problem.
>
>As for the more general case, a hyperlink may in some cases overlap 
>text runs.  For example:
>
><doc>
>  <p>
>    <text_run emphasis="bold">Click 
>      <hyperlink_begin id="111" end="222">
>        <locator_url protocol="http" host_name="www.sf.net"/>
>      </hyperlink_begin>
>      here
>    </text_run>
>    <text_run> to download.</text_run>
>    <hyperlink_end id="222" begin="111"/>
>  </p>
></doc>
>
>In fact, hyperlinks can overlap paragraphs and other document elements 
>though this is rarely seen in practice.
>
>-Tracy
>
>
>-----Original Message-----
>From: Mike Trotman [mailto:mike.trotman@xxxxxxxxxxxxx]
>Sent: Wednesday, July 07, 2004 1:26 PM
>To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
>Subject: Re: [xsl] XSL pattern needed for begin/end elements
>
>
>If the begin and end elements are siblings at the same level then the
>problem is tractable and probably not too difficult to solve.
>
>However if they can occur at different levels then this means that one
>of them is enclosed inside an element that excludes the other (I
think).
>
>Can you give any example of a case where the begin and end elements are
>not siblings at the same level?
>I ask because:
>a) I can't picture how this would make sense given the information that

>you require them to contain
>b) If one of them does occur inside an element that excludes the other
>    - what would you want to to with the excluded part of this elements

>content / tree?
>    - If you start closing all the parent elements etc (and opening
them
>
>again to match the orphaned end tags)
>    then you are destroying the structure and meaning of the XML data
>which XSLT is designed to help preserve.
>
>I.e. if they are not siblings at the same level then the XML data
>'structure' is totally inappropriate for XSLT
>and the 1st thing you should do is process it using something else.
>
>I have documents like this - and I process them by walking the DOM tree
>and maintaining a separate STACK of whatever I consider my current 
>context to be.
>(I am doing this to detect overlap between different document layers 
>marked in exactly the way you describe.)
>
>Tracy Atteberry wrote:
>
>  
>
>>Hi all,
>>
>>I'm looking for an XSL pattern to solve the problem of going from XML
>>that has separate begin and end elements to one that does not.
>>
>>Please, please note that I do not control either the source or target
>>XML formats.  If I did, this would be much easier.
>>
>>Source XML snip:
>>
>><doc>
>> <hyperlink_begin id=3D"111" end=3D"222">
>>   <locator_url protocol=3D"http" host_name=3D"www.sf.net"/>  
>></hyperlink_begin>  <text_run>Click</text_run>
>> <text_run emphasis=3D"bold">here.</text_run>
>> <hyperlink_end id=3D"222" begin=3D"111"/>
>></doc>
>>
>>Target XML example:
>>
>><cod>
>> <HyperLink xlink:href=3D"http://www.sf.net";>
>>   Click <b>here.</b>
>> </HyperLink>
>></cod>
>>
>>In my case I can assume that associated begin and end hyperlink tags
>>will occur as siblings -- though generally this is not the case and in

>>fact, this is the reason the begin and end tags are unique elements.
>>
>>I have a template that /almost/ works so feel free to let me know why
>>it fails OR suggest a completely different solution.
>>
>>Current XSL template snip:
>>
>><xsl:template match=3D"//hyperlink_begin">
>>   <xsl:variable name=3D"linkUrl">
>>       <xsl:value-of select=3D"locator_url/@protocol"/>
>>       <xsl:text>://</xsl:text>
>>       <xsl:value-of select=3D"locator_url/@host_name"/>
>>   </xsl:variable>
>>   <xsl:variable name=3D"endID" select=3D"@end"/>
>>   <xsl:element name=3D"HyperLink">
>>       <xsl:attribute name=3D"xlink:href"><xsl:value-of
>>select=3D"$linkUrl"/></xsl:attribute>
>>       <xsl:apply-templates select=3D"(following-sibling::*) except 
>>(following-sibling::hyperlink_end[@id=3D$endID]/following-sibling::*)"
/
>>
>>   </xsl:element>
>></xsl:template>
>>
>>This produces the correct hyperlink but the template for text_run
>>elements gets called twice this way -- once inside the hyperlink, then

>>again as templates continue to be applied.
>>
>>Any help would be greatly appreciated.  Thanks!
>>
>>Tracy Atteberry
>>
>>PS. I'm using Saxon 8
>>
>>--+------------------------------------------------------------------
>>XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>>To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
>>or e-mail: <mailto:xsl-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
>>--+--
>> 
>>
>>    
>>
>
>  
>

-- 
Datalucid Limited
8 Eileen Road
South Norwood
London SE25 5EJ
United Kingdom

/
tel :0208-239-6810
mob: 0794-725-9760
email: mike.trotman@xxxxxxxxxxxxx

UK Co. Reg:   4383635
VAT Reg.:   798 7531 60

/


--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
--+--


--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
--+--

Current Thread