Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests to document current definition of EXISTS in SPARQL #42

Open
pfps opened this issue Jun 18, 2016 · 31 comments
Open

tests to document current definition of EXISTS in SPARQL #42

pfps opened this issue Jun 18, 2016 · 31 comments
Labels

Comments

@pfps
Copy link
Contributor

pfps commented Jun 18, 2016

SPARQL EXISTS has lots of problems. It produces invalid algebraic structures, it hits explicitly undefined situations in the algebra, it produces counterintuitive results, I don't know of any SPARQL implementation that implements it correctly, different implementations of SPARQL implement it differently.

I have added a bunch of tests to document the correct behaviour of EXISTS according to the SPARQL specification. These tests are available at
https://github.com/pfps/rdf-tests/tree/gh-pages/sparql11/data-sparql11/exists

I have manually run all these tests on Virtuoso Open Source 7, with the following results

Status of tests for Virtuoso

        Syntax  Works   Result

existsScope01.rq Y Y correct - no semantic issue reported
existsScope02.rq Y Y admissible - no semantic issue reported
existsValues01.rq Y Y wrong - no semantic issue reported
existsValues02.rq Y Y correct - no semantic issue reported

existsBlank01.rq Y Y wrong
existsBound.rq Y Y wrong
existsMinus01.rq Y Y wrong

existsSubquery01.rq Y Y correct - no semantic issue reported
existsSubquery02.rq Y Y correct - no semantic issue reported
existsSubquery03.rq Y Y correct - no semantic issue reported
existsSubquery04.rq Y Y correct
existsSubquery05.rq Y Y correct
existsSubquery06.rq Y Y correct - no semantic issue reported
existsSubquery07.rq Y Y wrong - no semantic issue reported
existsSubquery08.rq Y Y wrong - no semantic issue reported
existsSubquery09.rq Y Y correct - no semantic issue reported

existsHernandez01.rq Y Y correct
existsHernandez02.rq Y Y correct

Note that one test hits an explicitly undefined part of the SPARQL algebra. I don't know how to indicate that. Quite a few other tests produce invalid algebraic structures which don't show up in the output. I don't know how to indicate that. I have added comments to indicate when this happens.

I have also added my suggestions on what changes should be made to fix EXISTS.

@gkellogg
Copy link
Member

Thanks Peter, could you do a PR to pull them into this repo? I'll promote these on public-sparql-dev and rdf-tests for consensus. We should get at least two other implementations to pass them.

@pfps
Copy link
Contributor Author

pfps commented Jun 18, 2016

OK, #43

I think that virtuoso is going to have the best coverage and it really only
passes 4 out of 18 tests. Of course, this just shows the extent of the
problems with the definition of EXISTS.

peter

On 06/18/2016 04:10 PM, Gregg Kellogg wrote:

Thanks Peter, could you do a PR to pull them into this repo? I'll promote
these on public-sparql-dev and rdf-tests for consensus. We should get at least
two other implementations to pass them.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AGT8e6XuPZTaAwlEEj3H3pKxGWEYbOiYks5qNHrogaJpZM4I5DEZ.

@lisp
Copy link

lisp commented Jun 19, 2016

it may be that the file 'existsBlank01.srq' is intended to be 'existsBlank01.srx'.

@pfps
Copy link
Contributor Author

pfps commented Jun 19, 2016

Certainly. I've made the change in my fork. I think that the pull request
will track that change.

I don't have a test harness, so I haven't verified that everything conforms to
a good test description but I should have caught an incorrect file extension.

peter

On 06/19/2016 02:11 PM, james anderson wrote:

it may be that the file 'existsBlank01.srq' is intended to be 'existsBlank01.srx'.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AGT8e1wQmYmODxjvlKbRWUERm3qlsXYEks5qNbB8gaJpZM4I5DEZ.

@lisp
Copy link

lisp commented Jun 19, 2016

it may be, that existsMinus01.srx is not a properly encoded result set.

@gkellogg
Copy link
Member

Peter's changes were also added to the pfps-sparql-exists branch in this repo for convenience. We may end up replacing PR #43 with a new PR based on that branch from this repository.

@pfps
Copy link
Contributor Author

pfps commented Jun 20, 2016

Six results files were missing result tags. I've fixed them, I think, and
pushed the commit back to github.

peter

On 06/19/2016 04:55 PM, james anderson wrote:

it may be, that existsMinus01.srx is not a properly encoded result set.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AGT8e1Kns73na4WQOuOo8b8AurIcC_uXks5qNdbqgaJpZM4I5DEZ.

@lisp
Copy link

lisp commented Jun 20, 2016

now transcribed to the branch which gregg created and pushed.

@ericprud
Copy link
Member

I know that Andy prototyped EXISTS in Jena. I would expect it to
conform with what he intended to document in SPARQL 1.1 .

peter

On 06/18/2016 04:10 PM, Gregg Kellogg wrote:

Thanks Peter, could you do a PR to pull them into this repo? I'll promote
these on public-sparql-dev and rdf-tests for consensus. We should get at least
two other implementations to pass them.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AGT8e6XuPZTaAwlEEj3H3pKxGWEYbOiYks5qNHrogaJpZM4I5DEZ.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#42 (comment)

-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.

@kasei
Copy link
Contributor

kasei commented Jun 21, 2016

@pfps could you provide a bit more explanation of your table of virtuoso results above? I'm confused by "Syntax" vs. "Works", and what it means when Works="Y" but Result="wrong".

@pfps
Copy link
Contributor Author

pfps commented Jun 21, 2016

On 06/20/2016 05:50 PM, Gregory Todd Williams wrote:

@pfps https://github.com/pfps could you provide a bit more explanation of
your table of virtuoso results above? I'm confused by "Syntax" vs. "Works",
and what it means when Works="Y" but Result="wrong".

Yeah, I should have put more explanation there. I just put my internal
recording of what virtuoso did and then the formatting was fiddled with by
whatever github uses.

The four columns are:

  • the test file name
  • Syntax: whether virtuoso accepts the query as valid SPARQL syntax
  • Works: whether virtuoso evaluates the query without throwing any
    run-time error
  • Result:
    correct = virtuoso produces the result I expected
    wrong = virtuoso produces some other result
    admissable = virtuoso produces a result when the result is not dictated
    no semantic issue reported = virtuoso does not complain when some
    internal semantic condition is violated (like a mapping for a
    non-variable)

I've repeated the results here so that you don't have to look them up.

Status of tests for Virtuoso

    Syntax  Works   Result

existsScope01.rq Y Y correct - no semantic issue reported
existsScope02.rq Y Y admissible - no semantic issue reported
existsValues01.rq Y Y wrong - no semantic issue reported
existsValues02.rq Y Y correct - no semantic issue reported

existsBlank01.rq Y Y wrong
existsBound.rq Y Y wrong
existsMinus01.rq Y Y wrong

existsSubquery01.rq Y Y correct - no semantic issue reported
existsSubquery02.rq Y Y correct - no semantic issue reported
existsSubquery03.rq Y Y correct - no semantic issue reported
existsSubquery04.rq Y Y correct
existsSubquery05.rq Y Y correct
existsSubquery06.rq Y Y correct - no semantic issue reported
existsSubquery07.rq Y Y wrong - no semantic issue reported
existsSubquery08.rq Y Y wrong - no semantic issue reported
existsSubquery09.rq Y Y correct - no semantic issue reported

existsHernandez01.rq Y Y correct
existsHernandez02.rq Y Y correct

As far as I can tell, virtuoso is acting as if EXISTS means putting the
current solution sequence at the front of the EXISTS argument and also at the
front of the WHERE clause in every subquery.

peter

@kasei
Copy link
Contributor

kasei commented Jun 21, 2016

Thanks. So does "correct" correlate with the tests included in #43 (That is, a "correct" result listed here indicates that the corresponding test passed in the test suite.)?

@pfps
Copy link
Contributor Author

pfps commented Jun 21, 2016

On 06/20/2016 06:46 PM, Gregory Todd Williams wrote:

Thanks. So does "correct" correlate with the tests included in #43
#43 (That is, a "correct" result listed
here indicates that the corresponding test passed in the test suite.)?

Yes, these correlate with the tests in #43.

What counts as a pass is a bit hard to determine.

Any test that has a "correct" or an "admissable" by itself is a pass.

A strict interpretation would be that "no semantic issue reported" is a
failure because SPARQL implementations should complain if, for example, a
solution mapping has "a" where there should be a variable. A loose
interpretation would be that these internal semantic issues don't matter as
long as the correct final result comes out.

In the strict interpretation virtuoso gets only 4 out of 18 correct but in the
loose interpretation it gets 12 out of 18 correct.

peter

@kasei
Copy link
Contributor

kasei commented Jun 21, 2016

OK. With a few bug fixes for applying substitute to filter expressions, my implementation passes all but these 4:

  • existsScope01 – Not enough manifest data to evaluate this test
  • existsBlank01 – I don't intend to support this as I believe the spec text doesn't align with the intention
  • existsSubquery06 – I believe this is a result of not conforming to the recently agreed upon errata
  • existsSubquery09 – I believe this is a result of not conforming to the recently agreed upon errata

@pfps
Copy link
Contributor Author

pfps commented Jun 21, 2016

On 06/20/2016 07:31 PM, Gregory Todd Williams wrote:

OK. With a few bug fixes for applying |substitute| to filter expressions, my
implementation passes all but these 4:

  • existsScope01 – Not enough manifest data to evaluate this test

Not existsScope02? That's the one where SPARQL says that the result is
explicitly undefined so I don't have a results bit.

  • existsBlank01 – I don't intend to support this as I believe the spec text
    doesn't align with the intention

Sounds like a good candidate for a community-supported erratum.

  • existsSubquery06 – I believe this is a result of not conforming to the
    recently agreed upon errata

Hmm. I may have messed this one up when I switched the predicate I was using
from :a to :p and didn't carry that through to this test. I've fixed it.

  • existsSubquery09 – I believe this is a result of not conforming to the
    recently agreed upon errata

Which erratum? This test requires substitution of disconnected variables to
get the correct answer.

peter

@kasei
Copy link
Contributor

kasei commented Jun 21, 2016

Not existsScope02? That's the one where SPARQL says that the result is
explicitly undefined so I don't have a results bit.

Ah. My mistake. I fail both existsScope01 and existsScope02. I mistook the latter for the former, and also had trouble in modifying my test harness to overlook tests in the manifest that don't have an mf:result. I'm not sure I understand your reasoning for having both of these tests (though I haven't dug too deeply into them).

Hmm. I may have messed this one up when I switched the predicate I was using from :a to :p and didn't carry that through to this test. I've fixed it.

I now pass this test.

Which erratum? This test requires substitution of disconnected variables to get the correct answer.

It was an educated guess as to why I was failing the test. That may not be the reason. I'll try to have a deeper look and see what's going on.

@pfps
Copy link
Contributor Author

pfps commented Jun 21, 2016

On 06/20/2016 08:11 PM, Gregory Todd Williams wrote:

Not existsScope02? That's the one where SPARQL says that the result is
explicitly undefined so I don't have a results bit.

Ah. My mistake. I fail both existsScope01 and existsScope02. I mistook the
latter for the former, and also had trouble in modifying my test harness to
overlook tests in the manifest that don't have an mf:result. I'm not sure I
understand your reasoning for having both of these tests (though I haven't dug
too deeply into them).

Hmm. I may have messed this one up when I switched the predicate I was
using from :a to :p and didn't carry that through to this test. I've fixed it.

I now pass this test.

Which erratum? This test requires substitution of disconnected variables
to get the correct answer.

It was an educated guess as to why I was failing the test. That may not be the
reason. I'll try to have a deeper look and see what's going on.

One of the problems with EXISTS is that it substitutes everywhere, including
in solution mappings, variables being bound, and variables being reported out
of subqueries.

So the exists in
SELECT ?x WHERE { ?x :p :b . FILTER EXISTS { BIND ( :j AS ?x ) } }
ends up looking something like (although the substitution is in the SPARQL
algebra, not the surface syntax)
BIND ( :j AS :b )
for the graph
:b :p :b .

This is semantically suspect as it creates a solution mapping that maps :b to
:j and :b is not a variable. This doesn't change the number of bindings so it
sort of doesn't affect the final result.

In
SELECT ?x WHERE { ?x :p ?y . FILTER EXISTS { BIND ( :j AS ?x )
BIND ( :k AS ?y } }
the substitution ends up looking something like
{ BIND ( :j AS :b ) BIND ( :k AS :b ) }
This is explicitly undefined in the SPARQL algebra because Extend (BIND) is
not supposed to be used to change mappings, only extend them.

peter

@kasei
Copy link
Contributor

kasei commented Jun 21, 2016

Ah. Understood. Those are good tests to have to flesh out the weirdness in the spec as written, but I think that's another case where I don't believe the spec text accurately represents the intention, so wouldn't be making any changes to my systems to deal with the difference between existsScope01 and existsScope02.

@pfps
Copy link
Contributor Author

pfps commented Jun 21, 2016

On 06/20/2016 08:28 PM, Gregory Todd Williams wrote:

Ah. Understood. Those are good tests to have to flesh out the weirdness in the
spec as written, but I think that's another case where I don't believe the
spec text accurately represents the intention, so wouldn't be making any
changes to my systems to deal with the difference between existsScope01 and
existsScope02.

Another case where a community-backed erratum should be created.

You may believe that the spec is wrong, but others may not. That's a bad
situation to be in for a W3C recommendation.

peter

@lisp
Copy link

lisp commented Jun 21, 2016

in order that these tests be useful, it would help if they were more explicit.

the existing test suite suffers severely from test definitions which do not explain their behaviour, but instead presume that the results are self-evident.
the only recourse an implement has is to follow a link to a discussion thread which may or may not provide an explanation.
in this particular situation, where the interpretation of the recommendation itself is at issue, it is not optimal to continue that practice.

the test declarations would be much improved were they to not just a claim how the purported interpretation of the recommendation is to be changed, but also to record for the intended interpretation and detail the behaviour for which the test produces the indicated result.

@lisp
Copy link

lisp commented Jun 21, 2016

On 2016-06-21, at 05:24, Peter F. Patel-Schneider notifications@github.com wrote:
[…]

One of the problems with EXISTS is that it substitutes everywhere, including
in solution mappings, variables being bound, and variables being reported out
of subqueries.

it is also possible, that this is not a “problem with exists”.
it may be that the anomaly follows from a misinterpretation of the recommendation.

the word “substitute” appears in another place in the recommendation text, in addition to the passage which concerns exists.
by one reading, the interpretation apparent at that location is not to be reconciled with one which has been suggested for exists, which involves conflating lexical form and abstract syntax with the data model.
this suggests that, it may be the case that, the authors would not have intended a naive interpretation to apply to exists.
in particular since, if the less naive interpretation is permitted to apply to exists as well, (some of) the purported anomalies do not arise.

nothing in these tests has yet succeeded to change my view, that the recommendation is incomplete, that some of the behaviour wrt exists is underspecified and, that careful specification suffices.
it may be, that i have not yet understood an issue which some one of them is intended to reveal.
as they stand, when run, they all produce results.
that is, none fails.
of the results, which i have examined, all agree with my interpretation of the recommendation.
some diverge from that specified by the test, but leave me at a loss as to how the specified result follows from the recommendation.
as noted previously, a more careful exposition in the test declarations could demonstrate something which is not yet evident.

@lisp
Copy link

lisp commented Jun 21, 2016

andy seaborne responded to the related thread on public-rdf-tests@w3.org with a pointer to issue 68 in the shapes working group.
that thread extends over fifty messages.
would it be possible for someone to compose one test which constitutes the one concrete case which led the shapes wg to believe they cannot proceed given their conception of the sparql recommendation?

@pfps
Copy link
Contributor Author

pfps commented Jun 21, 2016

On 06/21/2016 02:54 AM, james anderson wrote:

in order that these tests be useful, it would help if they were more explicit.

the existing test suite suffers severely from test definitions which do not
explain their behaviour, but instead presume that the results are self-evident.
the only recourse an implement has is to follow a link to a discussion thread
which may or may not provide an explanation.
in this particular situation, where the interpretation of the recommendation
itself is at issue, it is not optimal to continue that practice.

the test declarations would be much improved were they to not just a claim how
the purported interpretation of the recommendation is to be changed, but also
to record for the intended interpretation and detail the behaviour for which
the test produces the indicated result.

I added an analysis.text file that analyzes many of the tests to show just
what is going on. Most of this was already in my other posts, but this file
has walk-throughs of how SPARQL transforms, substitutes, and evaluates the
tests that are analyzed.

peter

@pfps
Copy link
Contributor Author

pfps commented Jun 21, 2016

On 06/21/2016 06:00 AM, james anderson wrote:

andy seaborne responded to the related thread on public-rdf-tests@w3.org
mailto:public-rdf-tests@w3.org with a pointer to issue 68 in the shapes
working group.
that thread extends over fifty messages.
would it be possible for someone to compose one test which constitutes the one
concrete case which led the shapes wg to believe they cannot proceed given
their conception of the sparql recommendation?

The W3C Data Shapes Working Group can proceed, even with the current SPARQL
recommendation. However, the product of the working group, the SHACL shapes
constraint language, heavily depends on SPARQL and in particular EXISTS.

To be viable with the current SPARQL recommendation, the SHACL specification
is going to have to have disclaimers like:

The normative definition of SHACL depends on a particular behaviour of SPARQL
queries that differs from the behaviour required for compliance with the W3C
SPARQL specification at
https://www.w3.org/TR/2013/REC-sparql11-query-20130321/ with errata applied as
of ??/??/????. Some SPARQL implementations, including ..., differ from the
SPARQL specification in the way required by SHACL but it is not known whether
all SPARQL implementations do.

That's not a happy place to be in.

The reason that this would be required is that SHACL heavily uses EXISTS.

As far as issue 68 goes, that hits another problem with SPARQL. Many SPARQL
implementations have a notion of pre-binding, which is executing a SPARQL
query with an initial solution mapping. Unfortunately, there is no decent
definition of pre-binding available anywhere. The working group has thus had
to come up with their own definition.

The first definition was like the definition of EXISTS, and had at least all
the problems that this definition has. The current definition doesn't use
substitution, but has its own, even more severe, problems.

So the working group could proceed by coming up with a definition of
pre-binding even without any change to the SPARQL specification. However,
that definition is going to have to say something like:

This definition of pre-binding does not necessarily correspond to the
implementation of pre-binding in SPARQL implementations. This definition of
pre-binding does not correspond with the definitions of related parts of
SPARQL, such as EXISTS.

Also not a happy place to be in.

Pre-binding is used for just about every construct in SHACL.

The extension mechanism of SHACL is even more intertwined with SPARQL and
EXISTS will be used heavily. There will have to be a caution for users of
the extension mechanism to be wary of problematic uses of EXISTS (which is
almost all of them) and to avoid uses of EXISTS that are known to be
implemented differently in different SPARQL implementations.

peter

@kasei
Copy link
Contributor

kasei commented Jun 21, 2016

Pre-binding is used for just about every construct in SHACL.

Does that mean SHACL won't work with SPARQL systems that don't also support pre-binding? I don't know anything about SHACL, but that doesn't sound like a good requirement.

@pfps
Copy link
Contributor Author

pfps commented Jun 21, 2016

On 06/21/2016 10:54 AM, Gregory Todd Williams wrote:

Pre-binding is used for just about every construct in SHACL.

Does that mean SHACL won't work with SPARQL systems that don't also support
pre-binding? I don't know anything about SHACL, but that doesn't sound like a
good requirement.

SHACL is (more or less) defined as an extension to SPARQL. To implement SHACL
you take a SPARQL implementation and add a few functions and also add
pre-binding and then add some control stuff. You almost certainly need
access to the innards of a SPARQL implementation to implement SHACL.

So it's not the case that the lack of pre-binding is a special problem. In
fact, the presence of pre-binding in a SPARQL implementation could easily be
bigger problem if this pre-binding was different from the pre-binding needed
for SHACL.

Whether it is a good idea to define SHACL as an extension to SPARQL is a
separate matter.

peter

@lisp
Copy link

lisp commented Jun 21, 2016

On 2016-06-21, at 19:11, Peter F. Patel-Schneider notifications@github.com wrote:

[…]

the test declarations would be much improved were they to not just a claim how
the purported interpretation of the recommendation is to be changed, but also
to record for the intended interpretation and detail the behaviour for which
the test produces the indicated result.

I added an analysis.text file that analyzes many of the tests to show just
what is going on. Most of this was already in my other posts, but this file
has walk-throughs of how SPARQL transforms, substitutes, and evaluates the
tests that are analyzed.

thank you.

@lisp
Copy link

lisp commented Jun 21, 2016

On 2016-06-21, at 19:50, Peter F. Patel-Schneider notifications@github.com wrote:

On 06/21/2016 06:00 AM, james anderson wrote:

andy seaborne responded to the related thread on public-rdf-tests@w3.org
mailto:public-rdf-tests@w3.org with a pointer to issue 68 in the shapes
working group.
that thread extends over fifty messages.
would it be possible for someone to compose one test which constitutes the one
concrete case which led the shapes wg to believe they cannot proceed given
their conception of the sparql recommendation?

The W3C Data Shapes Working Group can proceed, even with the current SPARQL
recommendation. However, the product of the working group, the SHACL shapes
constraint language, heavily depends on SPARQL and in particular EXISTS.

To be viable with the current SPARQL recommendation, the SHACL specification
is going to have to have disclaimers like:

The normative definition of SHACL depends on a particular behaviour of SPARQL
queries that differs from the behaviour required for compliance with the W3C
SPARQL specification at
https://www.w3.org/TR/2013/REC-sparql11-query-20130321/ with errata applied as
of ??/??/????. Some SPARQL implementations, including ..., differ from the
SPARQL specification in the way required by SHACL but it is not known whether
all SPARQL implementations do.

i read these paragraphs, above, as contradicting each other.

That's not a happy place to be in.

The reason that this would be required is that SHACL heavily uses EXISTS.

please give a concise concrete example of a query which demonstrates this issue and describe the use case which requires it.

As far as issue 68 goes, that hits another problem with SPARQL. Many SPARQL
implementations have a notion of pre-binding, which is executing a SPARQL
query with an initial solution mapping. Unfortunately, there is no decent
definition of pre-binding available anywhere. The working group has thus had
to come up with their own definition.

The first definition was like the definition of EXISTS, and had at least all
the problems that this definition has. The current definition doesn't use
substitution, but has its own, even more severe, problems.

So the working group could proceed by coming up with a definition of
pre-binding even without any change to the SPARQL specification. However,
that definition is going to have to say something like:

This definition of pre-binding does not necessarily correspond to the
implementation of pre-binding in SPARQL implementations. This definition of
pre-binding does not correspond with the definitions of related parts of
SPARQL, such as EXISTS.

Also not a happy place to be in.

Pre-binding is used for just about every construct in SHACL.

please give a concise concrete example of a query which demonstrates this issue and describe the use case which requires it.

@pfps
Copy link
Contributor Author

pfps commented Jun 21, 2016

On 06/21/2016 02:29 PM, james anderson wrote:

On 2016-06-21, at 19:50, Peter F. Patel-Schneider notifications@github.com
wrote:

On 06/21/2016 06:00 AM, james anderson wrote:

andy seaborne responded to the related thread on public-rdf-tests@w3.org
mailto:public-rdf-tests@w3.org with a pointer to issue 68 in the shapes
working group.
that thread extends over fifty messages.
would it be possible for someone to compose one test which constitutes the one
concrete case which led the shapes wg to believe they cannot proceed given
their conception of the sparql recommendation?

The W3C Data Shapes Working Group can proceed, even with the current SPARQL
recommendation. However, the product of the working group, the SHACL shapes
constraint language, heavily depends on SPARQL and in particular EXISTS.

To be viable with the current SPARQL recommendation, the SHACL specification
is going to have to have disclaimers like:

The normative definition of SHACL depends on a particular behaviour of SPARQL
queries that differs from the behaviour required for compliance with the W3C
SPARQL specification at
https://www.w3.org/TR/2013/REC-sparql11-query-20130321/ with errata applied as
of ??/??/????. Some SPARQL implementations, including ..., differ from the
SPARQL specification in the way required by SHACL but it is not known whether
all SPARQL implementations do.

i read these paragraphs, above, as contradicting each other.

How so? It's not going to be a happy solution of course. SHACL will be only
usable with certain SPARQL implementations, which goes against the W3C goal of
interoperatbility and adherence to recommendations. However, it does meant
that SHACL could maybe get two interoperating implementations that adhere to
the SHACL recommendation.

I would personally vote against advancing something that has this to
recommendation status, but I'm not going to get a vote.

That's not a happy place to be in.

The reason that this would be required is that SHACL heavily uses EXISTS.

please give a concise concrete example of a query which demonstrates this
issue and describe the use case which requires it.

Take a look at the current SHACL spec, at
http://w3c.github.io/data-shapes/shacl/. You will see a lot of SPARQL queries
that provide normative definitions of large chunks of SHACL. The large
majority of these queries use EXISTS, including

SELECT $this ($this AS ?subject) $predicate (?value AS ?object)
WHERE {
$this $predicate ?value .
FILTER NOT EXISTS { ?value rdf:type/rdfs:subClassOf* $class }
}

This query has problems when ?value is a blank node, where instead of checking
that the node itself is related to $class via rdf:type/rdfs:subClassOf*
instead checks that any node is related to $class via that path. So the
definition of sh:class, an important piece of SHACL, depends on an
interpretation of EXISTS that diverges from the definition of EXISTS in the
SPARQL specification.

As far as issue 68 goes, that hits another problem with SPARQL. Many SPARQL
implementations have a notion of pre-binding, which is executing a SPARQL
query with an initial solution mapping. Unfortunately, there is no decent
definition of pre-binding available anywhere. The working group has thus had
to come up with their own definition.

The first definition was like the definition of EXISTS, and had at least all
the problems that this definition has. The current definition doesn't use
substitution, but has its own, even more severe, problems.

So the working group could proceed by coming up with a definition of
pre-binding even without any change to the SPARQL specification. However,
that definition is going to have to say something like:

This definition of pre-binding does not necessarily correspond to the
implementation of pre-binding in SPARQL implementations. This definition of
pre-binding does not correspond with the definitions of related parts of
SPARQL, such as EXISTS.

Also not a happy place to be in.

Pre-binding is used for just about every construct in SHACL.

please give a concise concrete example of a query which demonstrates this
issue and describe the use case which requires it.

The above query is kicked off in SHACL with initial mappings for $this,
$predicate, and $class. SHACL was supposed to use the definition of
pre-binding in major SPARQL implementations, but no such usable definition of
was found for any SPARQL implementation. So SHACL had to have its own
definition of pre-binding. The initial definition was via substitution, so
any blank node mapping for any of these does the wrong thing if interpreted
according to the SPARQL specification. The current definition of pre-binding,
in http://w3c.github.io/data-shapes/shacl/#pre-binding, is to use the mapping
of a variable whenever that variable is evaluated. This is broken even worse
because variables in BGPs are not evaluated at all.

peter

@lisp
Copy link

lisp commented Jun 22, 2016

On 2016-06-22, at 01:20, Peter F. Patel-Schneider notifications@github.com wrote:
[…]

Take a look at the current SHACL spec, at
http://w3c.github.io/data-shapes/shacl/. You will see a lot of SPARQL queries
that provide normative definitions of large chunks of SHACL. The large
majority of these queries use EXISTS, including

SELECT $this ($this AS ?subject) $predicate (?value AS ?object)
WHERE {
$this $predicate ?value .
FILTER NOT EXISTS { ?value rdf:type/rdfs:subClassOf* $class }
}

This query has problems when ?value is a blank node, where instead of checking
that the node itself is related to $class via rdf:type/rdfs:subClassOf*
instead checks that any node is related to $class via that path. So the
definition of sh:class, an important piece of SHACL, depends on an
interpretation of EXISTS that diverges from the definition of EXISTS in the
SPARQL specification.

this could be said to be true, but only if one misunderstands “substitute” in a manner which diverges from that in section two.

@pchampin
Copy link
Contributor

This was discussed during the rdf-star meeting on 26 September 2024.

View the transcript

Addressing SPARQL EXISTS errata 4

ora: Are there people fine with the current syntax?

ora: In any case, chairs will discuss this, let's move on

AndyS: [about SPARQL EXISTS] There are two proposals

AndyS: 1. substitution based on various existing errata

AndyS: 2. an other one based on ANTIJOIN. We already have MINUS. Except the behavior with disjoin domain. But outside of it it's ANTIJOIN

AndyS: On an other note, there are other things that might go to SPARQL like LATERAL that can be based on substitution. And pure form of anti join and semi join

AndyS: It's a possibility to move these additions (LATERAL, anti join...) to sparql dev

pchampin: we would add more subtly differences between operators like FILTER NOT EXISTS vs MINUS

pchampin: Your point of having multiple ways might create problems

ora: SPARQL spec spends a bit of time presenting this difference

AndyS: It was quite contentious in SPARQL 1.1

<pchampin> I'm more than happy to let the editors decide on that

AndyS: I am not aware of any outgoing opinion, I think it ends up to a choice on which way to go

tl: is it related to triple terms in any way of is it a SPARQL errata

AndyS: it has nothing to do with triple terms

tl: what is the criteria of SPARQL errata to discuss now?

tl: it's a central issue, is that the argument?

pfps: There are a bunch of problems with SPARQL, the ones with EXIST are the biggies

pfps: They end up splitting the SPARQL implementation space

pfps: The decision that has to be made is to move SPARQL EXIST toward a more database-like implementation and keep it more consistent with the existing

AndyS: The current implementation is present in SQL with correlated subqueries

pfps: if you use the semi/anti join interepretation of EXISTS you change SPARQL more than the other option

pfps: In the end people who will see and understand the differences are very few

ora: I would like to know preferences

AndyS: My preference is for substitution and applying errata (option 1)

pfps: I don't have much of a horse in this race

pfps: Idealy I would love to get more SPARQL developers on board

ora: we could talk outside of the group

ktk: I reached out to stardog but not got an input

gtw: I am not sure much value to reach out to more developers. sparql-dev has been opened for a long time

<pchampin> Tpt: I have a signicant preference for option 1; option 2 is basically equivalent to MINUS

pfps: One way to check the issue would be to pull some tests

<pfps> which PR?

<gkellogg> w3c/rdf-tests#42

<gb> Issue 42 tests to document current definition of EXISTS in SPARQL (by pfps) [SPARQL]

<gkellogg> w3c/rdf-tests#43

<gb> CLOSED Pull Request 43 Add tests to document current definition of EXISTS (by pfps)

ora: Whatever solutions we pick, someone will ask why we pick it

AndyS: picking sustitution breaks the least queries

ora: That seems to me a as good reason as any, let's make a decision

tl: I would like to ask james about it

ora: Let's vote on it next Thursday

ora: Let's do it


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants