Re: [xsl] XPath related query

Subject: Re: [xsl] XPath related query
From: Brandon Ibach <brandon.ibach@xxxxxxxxxxxxxxxxxxx>
Date: Sun, 23 Jan 2011 18:12:14 -0500
This document is RDF/XML (the W3C standard XML serialization of an RDF
graph).  The standard for querying an RDF graph, regardless of how it
is serialized, is SPARQL [1], which operates at the level of the
graph, where XPath operates at the level of the XML model into which
the graph has been encoded.

Based on your description of the query, I'd write SPARQL like this:

ASK {
  :Book1 a :Book ; :chapter ?c1 .
  ?c1 :section ?s1 .
  ?s1 :cites ?a .
  ?a a :Article ; :chapter ?c2 .
  ?c2 :section ?s .
  ?s :figure "Example semi-joins" }

As an aside, this query differs a bit from your description in that it
doesn't include an "f" variable because the "figure" property has a
literal, rather than resource, value.  Also, it includes an extra
step, from chapter "c1" to section "s1", which was missing in your
description but present in your sample data.

The problem with querying RDF/XML with XPath is that the same RDF
graph can be encoded many different ways.  A robust query will need to
account for them all, which is going to make it complicated.  To
illustrate, your sample could also be serialized as follows, which
would produce exactly the same RDF graph:

<Book rdf:about="Book1"><chapter rdf:resource="Introduction"/></Book>
<Chapter rdf:about="Introduction"><section
rdf:resource="Section1"/></Chapter>
<Section rdf:about="Section1">
    <cites rdf:resource="Article2"/>
    <figure>"Example RDF graph"</figure>
</Section>
<Article rdf:about="Article2"><chapter rdf:resource="Proof"/></Article>
<Chapter rdf:about="Proof"><section rdf:resource="Semi-joins"/></Chapter>
<Section rdf:about="Semi-joins">
    <cites rdf:resource="Book3"/>
    <figure>"Example Semi-join"</figure>
</Section>

An XPath (covering just the path from Book1 to c1, to start simple)
that would return the same result over either of these two
serializations might be:

//*[@rdf:about=//Book[@rdf:about='Book1']/chapter/@rdf:resource or
parent::chapter/parent::Book[@rdf:about='Book1']]

That covers just 1 of 6 steps in your query.  The subsequent steps get
progressively longer.  The moral of the story is that using XPath
instead of SPARQL for querying RDF is a lot like using a regular
expression instead of XPath to query XML; you end up querying the
syntax, rather than the semantics.

All that said, you could write a more manageable XPath that will work
on your sample, but only if you're very confident that you know for
sure which types of properties will be serialized with element
nesting, as most are in your sample, and which with references, as the
"cites" properties are.  This might look like:

//Article[@rdf:about=//Book[@rdf:about='Book1']/chapter/*/section/*/cites/@rd
f:resource]/chapter/*/section/*[figure='Example
Semi-join']

This should work, but is not robust, given the potential variability
in the syntax of the RDF/XML serialization.

-Brandon :)

[1] http://www.w3.org/TR/rdf-sparql-query

On Sun, Jan 23, 2011 at 5:54 AM, Vineet Chaoji <vineetc@xxxxxxxxx> wrote:
> Hello All,
>
> The following sample XML document is generated from an RDF graph:
>
> <DBLP>
> <Book rdf:about="Book1">
>  <chapter>
>    <Chapter rdf:about="Introduction">
>       <section>
>          <Section rdf:about="Section1">
>             <cites rdf:resource="Article2"/>
>             <figure>"Example RDF graph"</figure>
>          </Section>
>       </section>
>    </Chapter>
>  </chapter>
> </Book>
>
> <Article rdf:about="Article2">
>  <chapter>
>    <Chapter rdf:about="Proof">
>       <section>
>          <Section rdf:about="Semi-joins">
>             <cites rdf:resource="Book3"/>
>             <figure>"Example Semi-join"</figure>
>          </Section>
>       </section>
>    </Chapter>
>  </chapter>
> </Article>
> </DBLP>
>
> I am trying to understand if a path query on an RDF graph can be
> represented as an XPath query over the corresponding XML structure.
>
> So the query that I am trying to represent is: Is there a Book "Book1"
> that has a chapter 'c1' that cites some article 'a' and does this
> article 'a' have a chapter 'c2' which has a section 's' which has a
> figure 'f' which is called "Example semi-joins". This is a binary
> yes/no type of query.
> 'c1', 'c2', 'a', 's' and 'f' are introduced for the sake of explaining
> the query. In the XML above, c1=Introduction, a="Article 2",
> c2="Proof", s="Semi-Joins".
>
> Since I am new to XPath, I am finding it hard to express this query.I
> am not even sure if this can be expressed in XPath.
>
> I have just broken down a single query to explain my line of thinking.
> This is how far I could come:
> If X =  //*[@rdf:about='Book1']//Chapter//cites@ref:resource  ---- I
> expect this to give me all the ref:resource attribute nodes?
> //Article[@ref:about==X]//Chapter//Section//figure[text() == "Example
> semi-joins"]  ---- Can I match the attribute nodes above with the
> ref:about attribute in Article?
>
> Any ideas how I could do this?
>
> Thanks in anticipation,
> Vineet

Current Thread