Re: [xsl] Need an XPath 2.0 expression that identifies a long block of uninterrupted non-blocking space characters in an XHTML document

Subject: Re: [xsl] Need an XPath 2.0 expression that identifies a long block of uninterrupted non-blocking space characters in an XHTML document
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 14 Oct 2019 17:36:59 -0000
Well, a predicate using name()='p' is bad news because it depends on namespace
prefixes, which are arbitrary. Use self::p, assuming it's a no-namespace
element, or self::xhtml:p if its in the XHTML namespace.

You could also do something like

> //p[o:p eq '&#160;'][every $p in following-sibling::*[position() le 10]
satisfies $p[self::p/o:p eq '&#160;']]

Michael Kay

> On 14 Oct 2019, at 18:09, Costello, Roger L. costello@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi Folks,
>
> As you may know, when a formatted email message is created in Outlook,
Outlook generates HTML under the hood.
>
> I am trying to determine if a formatted email message has text at the bottom
of the email message that is separated from the rest of the email by a lot of
space. In other words, the text at the bottom of the underlying HTML is
preceded by a bunch of non-blocking space characters (&#160;).
>
> Assume the HTML has been converted to XHTML.
>
> I need an XPath 2.0 expression that identifies a long block of non-blocking
space characters.
>
> Outlook generates HTML like that shown below. The non-blocking space
character is nested inside an <o:p> element, which is nested inside a <p>
element.
>
> I came up with this XPath expression:
>
> //p[o:p eq '&#160;'][count(following-sibling::*[position() le 10][name() eq
'p'][o:p eq '&#160;']) ge 10][1]
>
> It says, "Give me the first <p> element containing a non-blocking space
character such that there are at least 10 <p> elements that immediately follow
it, each containing a non-blocking space character." At least, that's what I
think it says. Note: 10 is an arbitrary number.
>
> Questions:
> 1. Do you see any problems with the XPath expression?
> 2. Is there a better XPath expression?
>
> <html xmlns:o="urn:schemas-microsoft-com:office:office">
>    <p class="MsoNormal">top text<o:p/></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal"><o:p>&#160;</o:p></p>
>    <p class="MsoNormal">bottom text<o:p/></p>
> </html>

Current Thread