-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition for new GC #20820
base: master
Are you sure you want to change the base?
Conversation
threads. The `thread_preSuspend` hook should return true when druntime has knowledge of a thread. But it's based on `sm_this` (the storage for `Thread.getThis`) being set. Because the thread lock is not taken to set `sm_this`, a race exists when a thread is suspended between this setting, and the adding to the thread list for scanning. Therefore, `thread_preSuspend` can return true, but `thread_scanAll` will not include that thread in the list of scannables.
Thanks for your pull request, @schveiguy! Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + dmd#20820" |
// Ensure setting `sm_this` and adding the thread to the list of | ||
// known threads is protected by the global thread lock. Otherwise, | ||
// GCs that use `thread_preSuspend` to determine if a thread is | ||
// registered might be told it is registered for scanning, but find | ||
// out it is not. | ||
Thread.slock.lock_nothrow(); | ||
Thread.setThis(obj); | ||
Thread.add(obj); | ||
Thread.slock.unlock_nothrow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the same pattern appears twice, and require a bit comment, I'd extract it in its own function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure it's actually needed for Windows in our case, because windows will not be pausing threads via signals. I just did it here for consistency.
I contemplated just combining setThis
and add
into one function that locks for everything. Maybe setThisAndAdd
?
In reality, I don't know how much of this is based on a faulty assumption of what it means to have sm_this
set. I know we added the boolean return, but that may be thwarted too. What if code calls setThis
itself, but doesn't register the thread?
Druntime uses the test of whether the thread is in the list of managed threads for when it might suspend them. The new GC has a different list, which it uses. If we migrated to using the druntime list instead, then we could probably not even need this change.
The new GC depends on
thread_preSuspend
to tell it if druntime will provide scanning details (stack and TLS) for the current thread. If it returns true, then the new GC assumes it can rely on druntime to scan the details. If it returns false, then the new GC does its own scanning mechanism.The cost of this is taking the
slock
a bit earlier than before, so it shouldn't affect anything.There is a note about lazy TLS allocation, but whatever that is must be long gone, because the
setThis
call is a simple assignment for all platforms. This may have been related to how OSX TLS was managed years ago.Note that this race caused a failure in a real world application. It's pretty specific to the new GC, and an internal detail to druntime, so there isn't really a test I can add, nor do I think we need to log about this.