From 76d49d2fd47e50bb37150700b55ce40a6819e64c Mon Sep 17 00:00:00 2001 From: Tobias Lindaaker Date: Mon, 6 Feb 2017 16:50:14 +0100 Subject: [PATCH] Draft RPQ CIP --- .../CIP2017-02-06-Regular-Path-Patterns.adoc | 160 ++++++++++++++++++ 1 file changed, 160 insertions(+) create mode 100644 cip/1.accepted/CIP2017-02-06-Regular-Path-Patterns.adoc diff --git a/cip/1.accepted/CIP2017-02-06-Regular-Path-Patterns.adoc b/cip/1.accepted/CIP2017-02-06-Regular-Path-Patterns.adoc new file mode 100644 index 0000000000..9952594a17 --- /dev/null +++ b/cip/1.accepted/CIP2017-02-06-Regular-Path-Patterns.adoc @@ -0,0 +1,160 @@ += CIP2017-02-06 Regular Path Patterns +:numbered: +:toc: +:toc-placement: macro +:source-highlighter: codemirror + +*Authors:* Tobias Lindaaker + +toc::[] + +== Regular Path Patterns + +Above and beyond the types of patterns that can be expressed in Cypher using the normal path syntax, Cypher also supports what amounts to regular expressions over paths. +This functionality is called Regular Path Patterns. + +A Regular Path Pattern is defined as: + +• A simple relationship type, or +• A Regular Path Pattern followed by another Regular Path Pattern, or +• An alternative between two Regular Path Patterns, or +• A repetition of a Regular Path Pattern, or +• A reference to a Defined Path Predicate. + +Regular Path Patterns are written similarly to how relationship patterns are written, but enclosed within two slash (`/`) characters instead of brackets (`[]`). + +Contrary to Relationship Patterns, Regular Path Patterns do _not_ allow binding a relationship to a variable. +In order to bind the matching path to a variable, a Path Assignment should be used, by preceding the path with an identifier and an equals sign (`=`). +This avoids a problem that existed in the past with repetition of relationships (a syntax that was deprecated with the introduction of Regular Path Patterns), where a relationship variable would bind to a list, making it hard to express predicates over the actual relationships. +Predicates on parts of a Regular Path Pattern are instead expressed through the use of explicitly defined path predicates. + +=== Syntax + +The syntax of Regular Path Patterns fit into the greater Cypher syntax through `PatternElementChain`. + +---- +PatternElementChain = (RelationshipPattern | RegularPathPattern), NodePattern ; + +RegularPathPattern = (LeftArrowHead, Dash, '/', [RegularPathExpression], '/', Dash, RightArrowHead) + | (LeftArrowHead, Dash, '/', [RegularPathExpression], '/', Dash) + | (Dash, '/', [RegularPathExpression], '/', Dash, RightArrowHead) + | (Dash, '/', [RegularPathExpression], '/', Dash) + ; +RegularPathExpression = {RegPathOr}- ; +RegPathOr = RegPathSeq, {'|', RegPathSeq} ; +RegPathSeq = {RegPathStar}- ; +RegPathStar = RegPathDirected [('*', [RangeLiteral]) | '+'] ; +RegPathDirected = ['<'], RegPathBase, ['>'] ; +RegPathBase = RegPathRelationship + | RegPathReference + | '(' RegularPathExpression ')' + ; +RegPathRelationship = RelType ; +RegPathReference = SymbolicName ; +---- + +The `RegPathReference` is a reference to a Defined Path Predicate. +These are defined using the following syntax: + +---- +DefinedPathPredicate = PathPredicatePrototype, 'IS', Pattern, [Where] ; +PathPredicatePrototype = '(', Variable, ')', RegPathPrototype, '(', Variable, ')' ; +RegPathPrototype = (LeftArrowHead, Dash, '/', DefinedPathName, '/', Dash) + | (Dash, '/', DefinedPathName, '/', Dash, RightArrowHead) + | (Dash, '/', DefinedPathName, '/', Dash) + ; +DefinedPathName = SymbolicName ; +---- + +=== Examples + +The astute reader of the syntax will have noticed that it is possible to express a Regular Path Pattern with an empty path expression: + +[source, cypher] +---- +MATCH (a)-//-(b) +---- + +This pattern simply states that `a` and `b` must be the same node, and is thus the same as: + +[source, cypher] +---- +MATCH (a), (b) WHERE a = b +---- + +The same reader will also have noticed that it is possible to define a pattern containing just a relationship type: + +[source, cypher] +---- +MATCH (a)-/:KNOWS/->(b) +---- + +That pattern is indeed equivalent to the very similar relationship pattern: + +[source, cypher] +---- +MATCH (a)-[:KNOWS]->(b) +---- + +The main difference being that the variant with a relationship pattern is able to bind that relationship and express further predicates over it. + +The Regular Path Patterns start becoming interesting when larger expressions are put together: + +[source, cypher] +.Finding someone loved by someone hated by someone you know, transitively +---- +MATCH (you)-/(:KNOWS :HATES)+ :LOVES/->(someone) +---- + +Note the `+` expressing one or more occurrences of the sequence `KNOWS` followed by `HATES`. + +The direction of each relationship is governed by the overall direction of the Regular Path Pattern. +It is however possible to explicitly define the direction for a particular part of the pattern. +This is done by either prefixing that part with `<` for a right-to-left direction or suffixing it with `>` for a left-to-right direction. +It is possible to both prefix the part with `<` and suffixing it with `>`, giving that part the interpretation of being undirected. + +[source, cypher] +.Specifying the direction for different parts of the pattern +---- +MATCH (you)-/(:KNOWS <:HATES)+ :LOVES/->(someone) +---- + +In the example above we say that the `HATES` relationships should have the opposite direction to the other relationships in the path. + +Through the use of Defined Path Predicates we can express even more predicates over a path: + +[source, cypher] +.Find a chain of unreciprocated lovers +---- +MATCH (you)-/unreciprocated_love*/->(someone) +PATH (a)-/unreciprocated_love/->(b) IS + (a)-[:LOVES]->(b) + WHERE NOT EXISTS { (b)-[:LOVES]->(a) } +---- + +Note how there is no colon used for referencing the Defined Path Predicate, the colon is used in Regular Path Patterns only for referencing actual relationship types. + +Sometimes it will be interesting to express a predicate on a node in a Regular Path Pattern. +This can be achieved by using a Defined Path Predicate where the nodes on both ends are the same: + +[source, cypher] +.Find friends of friends that are not haters +---- +MATCH (you)-/:KNOWS not_a_hater :KNOWS/-(friendly_friend_of_friend) +PATH (x)-/not_a_hater/-(x) IS (x) + WHERE NOT EXISTS { (x)-[:HATES]->() } +---- + +In the case of a Defined Path Predicate where both nodes are the same, the direction of the predicate is irrelevant. +In general the direction of a Defined Path Predicate is quite important, and used for mapping the pattern in the predicate into the Regular Path Patterns that reference it. +The only cases where it is allowed to omit the direction of a Defined Path Predicate is when the defined predicate is reflexive. +This is obviously the case when both nodes are the same, but it would also be the case when the internal pattern is symmetrical, such as in the following example: + +[source, cypher] +.Find chains of co-authorship +---- +MATCH (you)-/co_author*/-(someone) +PATH (a)-/co_author/-(b) IS + (a)-[:AUTHORED]->(:Book)<-[:AUTHORED]-(b) + WHERE a <> b +----