Extending SPARQL with CONSTRUCT Sub-queries

Recently myself and some of my colleagues were discussing something that we consider a key limitation of SPARQL right now which is that is provides no direct mechanism to create a temporary graph from the existing data to use in your query. Yes, you can use SPARQL update to INSERT a new graph based on existing data but we shouldn’t need to create a new graph just to answer the odd query, and if we do this and forget to remove it when we’re done we can cause other users’ queries to return unexpected results because we’ve modified the database.

To clarify what I mean, consider the following use case. We have a RDF database consisting of multiple social graphs where each has been imported from a different source and so each uses a different predicate for representing the friend relation. Most of the time we are just asking questions of a specific social graph but sometimes we want to ask a question over some subset of those graphs, this leads to unwieldy queries like so:

PREFIX foaf:
PREFIX fb:
PREFIX twitter:

SELECT *
WHERE
{
{ ?x foaf:knows ?y }
UNION
{ ?x fb:friend ?y }
UNION
{ ?x twitter:follows ?y }
}

If we want to get more information out like the names, email addresses etc. of each of those people and each is in a different vocal in its own graph, then your query quickly becomes unmanageable to write, especially if you start adding more complex query elements like FILTER and OPTIONAL into the mix.

Now some of this can be solved by normalizing your data into a single vocabulary when you ingest it but this isn’t always ideal depending on your use case. To try and address these kinds of problems where you want that normalized temporary graph for some of your queries, we propose an extension to SPARQL that allows the use of CONSTRUCT as a sub-query.

Syntax-wise this would look something like the following:

PREFIX foaf:
PREFIX fb:
PREFIX twitter:

SELECT *
FROM
CONSTRUCT { ?x foaf:knows ?y }
WHERE
{
{ ?x foaf:knows ?y }
UNION
{ ?x fb:friend ?y }
UNION
{ ?x twitter:follows ?y }
}
WHERE
{
?x foaf:knows ?y
}

For a simple query like this it actually makes the query more verbose, but with more complex queries this syntax could actually significantly reduce the verbosity and make the WHERE clause of the query – i.e. the question being asked – much more readable.

Essentially our proposal is that you allow the use of a CONSTRUCT as an additional form of dataset description which gives us two new syntactic forms:

• FROM CONSTRUCT
• FROM NAMED CONSTRUCT AS

Similarly to a sub-select our proposal is to define the grammar for a sub-construct so that it cannot specify any FROM clauses; i.e. we prevent nesting of sub-constructs.

The proposed interpretation of this is fairly simple. Any number of FROM CONSTRUCT may be used to construct temporary graphs which are merged together to form the default graph of the query and may be combined with normal FROMclauses. FROM NAMED CONSTRUCT is slightly more complex, firstly theused to name the temporary graph may not be referred to by any other FROM clause which stops people forcing indirect nesting of these. Secondly if thegiven for the graph name is the same as the name of any graph in the query services default dataset the temporary graph hides that graph for the course of the query. Once these or any other FROM/FROM NAMED clauses are interpreted to create the dataset the query processing proceeds as normal.

What do people think, would this be a useful extension to SPARQL?

Rob Vesse

Speak Your Mind

Your email address will not be published. Required fields are marked *