-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix join being denied after being invited over federation #18075
Fix join being denied after being invited over federation #18075
Conversation
Reproduction test for element-hq/synapse#18075
Reproduction test for element-hq/synapse#18075
This reverts commit a32c1ba.
This is just normal for how someone finds out about an invite over federation See #18075 (comment)
But seems plausible
We're now better at rejecting this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on all the testing here, and for discovering the root cause in the first place. I filed a spec issue off the back of it.
Some small comments below, but otherwise this looks good.
synapse/handlers/federation_event.py
Outdated
# After persistence, we always need to notify replication there may be new | ||
# data (backfilled or not) because TODO. | ||
self._notifier.notify_replication() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...they may need to act on that event type"?
One example is facilitating the Synapse Module API; where a module could be loaded on to any worker. A module may want to act on certain types of backfilled events arriving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than updating the comment regarding https://github.com/element-hq/synapse/pull/18075/files/872f717a6ab0a8e9ab598e1c1c853af88c39a11c#r1920895848, this PR now LGTM.
Thanks for all the hard work!
They now match the style of the other utilities under `client`: `mustKnockOnRoomSynced`/`syncKnockedOn` This is spawning from trying to write some knock tests that also stressed element-hq/synapse#18075 but they didn't pan out. I thought this refactor and update was still useful to upstream in any case.
Thanks for the review @anoadragon453 🦢 |
… already participating in (#757) Regression tests for element-hq/synapse#18075
…nvite scenario Regression tests for element-hq/synapse#18075
Fix join being denied after being invited over federation. This also happens for rejecting an invite. Basically, any out-of-band membership transition where we first get the membership as an
outlier
and then rely on federation filling us in to de-outlier it.This PR mainly addresses automated test flakiness, bots/scripts, and options within Synapse like
auto_accept_invites
that are able to react quickly (before federation is able to push us events), but also helps in generic scenarios where federation is lagging.I initially thought this might be a Synapse consistency issue (see issues labeled with
Z-Read-After-Write
) but it seems to be an event auth logic problem. Workers probably do increase the number of possible race condition scenarios that make this visible though (replication and cache invalidation lag).Fix #15012
(probably fixes matrix-org/synapse#15012 (#15012))
Related to matrix-org/matrix-spec#2062
Problems:
event_auth
logic even though we expose them in/sync
.What happened before?
I wrote some Complement test that stresses this exact scenario and reproduces the problem: matrix-org/complement#757
We have
hs1
andhs2
running in monolith mode (no workers):@charlie1:hs2
is invited and joins the room:hs1
invites@charlie1:hs2
to a room which we receive onhs2
asPUT /_matrix/federation/v1/invite/{roomId}/{eventId}
(on_invite_request(...)
) and the invite membership is persisted as an outlier. Theroom_memberships
andlocal_current_membership
database tables are also updated which means they are visible down/sync
at this point.@charlie1:hs2
decides to join because it saw the invite down/sync
. Becausehs2
is not yet in the room, this happens as a remote joinmake_join
/send_join
which comes back with all of the auth events needed to auth successfully and now@charlie1:hs2
is successfully joined to the room.@charlie2:hs2
is invited and and tries to join the room:hs1
invites@charlie2:hs2
to the room which we receive onhs2
asPUT /_matrix/federation/v1/invite/{roomId}/{eventId}
(on_invite_request(...)
) and the invite membership is persisted as an outlier. Theroom_memberships
andlocal_current_membership
database tables are also updated which means they are visible down/sync
at this point.hs2
is already participating in the room, we also see the invite come over federation in a transaction and we start processing it (not done yet, see below)@charlie2:hs2
decides to join because it saw the invite down/sync
. Becausehs2
, is already in the room, this happens as a local join but we deny the event because ourevent_auth
logic thinks that we have no membership in the room ❌ (expected to be able to join because we saw the invite down/sync
)@charlie2:hs2
invite event from and de-outlier it.Logs for
hs2
:Dev notes
Other unrelated but semi-related races:
send_join
races with local users sending messages #17720Running tests
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.(run the linters)