Skip to content

Commit

Permalink
Draft RPQ CIP
Browse files Browse the repository at this point in the history
  • Loading branch information
thobe committed Feb 6, 2017
1 parent 3ea7838 commit 76d49d2
Showing 1 changed file with 160 additions and 0 deletions.
160 changes: 160 additions & 0 deletions cip/1.accepted/CIP2017-02-06-Regular-Path-Patterns.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
= CIP2017-02-06 Regular Path Patterns
:numbered:
:toc:
:toc-placement: macro
:source-highlighter: codemirror

*Authors:* Tobias Lindaaker <tobias.lindaaker@neotechnology.com>

toc::[]

== Regular Path Patterns

Above and beyond the types of patterns that can be expressed in Cypher using the normal path syntax, Cypher also supports what amounts to regular expressions over paths.
This functionality is called Regular Path Patterns.

A Regular Path Pattern is defined as:

• A simple relationship type, or
• A Regular Path Pattern followed by another Regular Path Pattern, or
• An alternative between two Regular Path Patterns, or
• A repetition of a Regular Path Pattern, or
• A reference to a Defined Path Predicate.

Regular Path Patterns are written similarly to how relationship patterns are written, but enclosed within two slash (`/`) characters instead of brackets (`[]`).

Contrary to Relationship Patterns, Regular Path Patterns do _not_ allow binding a relationship to a variable.
In order to bind the matching path to a variable, a Path Assignment should be used, by preceding the path with an identifier and an equals sign (`=`).
This avoids a problem that existed in the past with repetition of relationships (a syntax that was deprecated with the introduction of Regular Path Patterns), where a relationship variable would bind to a list, making it hard to express predicates over the actual relationships.
Predicates on parts of a Regular Path Pattern are instead expressed through the use of explicitly defined path predicates.

=== Syntax

The syntax of Regular Path Patterns fit into the greater Cypher syntax through `PatternElementChain`.

----
PatternElementChain = (RelationshipPattern | RegularPathPattern), NodePattern ;
RegularPathPattern = (LeftArrowHead, Dash, '/', [RegularPathExpression], '/', Dash, RightArrowHead)
| (LeftArrowHead, Dash, '/', [RegularPathExpression], '/', Dash)
| (Dash, '/', [RegularPathExpression], '/', Dash, RightArrowHead)
| (Dash, '/', [RegularPathExpression], '/', Dash)
;
RegularPathExpression = {RegPathOr}- ;
RegPathOr = RegPathSeq, {'|', RegPathSeq} ;
RegPathSeq = {RegPathStar}- ;
RegPathStar = RegPathDirected [('*', [RangeLiteral]) | '+'] ;
RegPathDirected = ['<'], RegPathBase, ['>'] ;
RegPathBase = RegPathRelationship
| RegPathReference
| '(' RegularPathExpression ')'
;
RegPathRelationship = RelType ;
RegPathReference = SymbolicName ;
----

The `RegPathReference` is a reference to a Defined Path Predicate.
These are defined using the following syntax:

----
DefinedPathPredicate = PathPredicatePrototype, 'IS', Pattern, [Where] ;
PathPredicatePrototype = '(', Variable, ')', RegPathPrototype, '(', Variable, ')' ;
RegPathPrototype = (LeftArrowHead, Dash, '/', DefinedPathName, '/', Dash)
| (Dash, '/', DefinedPathName, '/', Dash, RightArrowHead)
| (Dash, '/', DefinedPathName, '/', Dash)
;
DefinedPathName = SymbolicName ;
----

=== Examples

The astute reader of the syntax will have noticed that it is possible to express a Regular Path Pattern with an empty path expression:

[source, cypher]
----
MATCH (a)-//-(b)
----

This pattern simply states that `a` and `b` must be the same node, and is thus the same as:

[source, cypher]
----
MATCH (a), (b) WHERE a = b
----

The same reader will also have noticed that it is possible to define a pattern containing just a relationship type:

[source, cypher]
----
MATCH (a)-/:KNOWS/->(b)
----

That pattern is indeed equivalent to the very similar relationship pattern:

[source, cypher]
----
MATCH (a)-[:KNOWS]->(b)
----

The main difference being that the variant with a relationship pattern is able to bind that relationship and express further predicates over it.

The Regular Path Patterns start becoming interesting when larger expressions are put together:

[source, cypher]
.Finding someone loved by someone hated by someone you know, transitively
----
MATCH (you)-/(:KNOWS :HATES)+ :LOVES/->(someone)
----

Note the `+` expressing one or more occurrences of the sequence `KNOWS` followed by `HATES`.

The direction of each relationship is governed by the overall direction of the Regular Path Pattern.
It is however possible to explicitly define the direction for a particular part of the pattern.
This is done by either prefixing that part with `<` for a right-to-left direction or suffixing it with `>` for a left-to-right direction.
It is possible to both prefix the part with `<` and suffixing it with `>`, giving that part the interpretation of being undirected.

[source, cypher]
.Specifying the direction for different parts of the pattern
----
MATCH (you)-/(:KNOWS <:HATES)+ :LOVES/->(someone)
----

In the example above we say that the `HATES` relationships should have the opposite direction to the other relationships in the path.

Through the use of Defined Path Predicates we can express even more predicates over a path:

[source, cypher]
.Find a chain of unreciprocated lovers
----
MATCH (you)-/unreciprocated_love*/->(someone)
PATH (a)-/unreciprocated_love/->(b) IS
(a)-[:LOVES]->(b)
WHERE NOT EXISTS { (b)-[:LOVES]->(a) }
----

Note how there is no colon used for referencing the Defined Path Predicate, the colon is used in Regular Path Patterns only for referencing actual relationship types.

Sometimes it will be interesting to express a predicate on a node in a Regular Path Pattern.
This can be achieved by using a Defined Path Predicate where the nodes on both ends are the same:

[source, cypher]
.Find friends of friends that are not haters
----
MATCH (you)-/:KNOWS not_a_hater :KNOWS/-(friendly_friend_of_friend)
PATH (x)-/not_a_hater/-(x) IS (x)
WHERE NOT EXISTS { (x)-[:HATES]->() }
----

In the case of a Defined Path Predicate where both nodes are the same, the direction of the predicate is irrelevant.
In general the direction of a Defined Path Predicate is quite important, and used for mapping the pattern in the predicate into the Regular Path Patterns that reference it.
The only cases where it is allowed to omit the direction of a Defined Path Predicate is when the defined predicate is reflexive.
This is obviously the case when both nodes are the same, but it would also be the case when the internal pattern is symmetrical, such as in the following example:

[source, cypher]
.Find chains of co-authorship
----
MATCH (you)-/co_author*/-(someone)
PATH (a)-/co_author/-(b) IS
(a)-[:AUTHORED]->(:Book)<-[:AUTHORED]-(b)
WHERE a <> b
----

0 comments on commit 76d49d2

Please sign in to comment.