-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(profiling) switch to pthread_atfork()
for fork barrier handling
#3058
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3058 +/- ##
============================================
+ Coverage 73.02% 74.79% +1.77%
Complexity 2787 2787
============================================
Files 139 112 -27
Lines 15272 11033 -4239
Branches 1043 0 -1043
============================================
- Hits 11152 8252 -2900
+ Misses 3569 2781 -788
+ Partials 551 0 -551
Flags with carried forward coverage won't be shown. Click here to find out more. see 27 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
Benchmarks [ profiler ]Benchmark execution time: 2025-01-29 13:14:15 Comparing candidate commit 77aedd4 in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 28 metrics, 7 unstable metrics. scenario:php-profiler-exceptions-with-profiler
|
IIRC this was done in the handlers as other fork() calls may not be in a valid state for the engine to continue / profiling is not even desired in the other process? |
AFAIK the main reason was that Apache is going through MINIT/STARTUP and then forks. That's why we moved most profiler init things to RINIT / first-RINIT and we should be good, but I'll evaluate this. |
1616f46
to
77aedd4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially very resistant to this change. Florian pointed out this line in the manual:
After a fork(2) in a multithreaded process returns in the child,
the child should call only async-signal-safe functions (see
signal-safety(7)) until such time as it calls execve(2) to
execute a new program.
And that it means from the point of fork
, the child must only call async-signal-safe functions until it calls execve
. The restriction is not limited to the child handler portion of the callback, but any code from the fork until exec. This means it's not really any different from our pcntl_fork
handler, at least at a distance.
So as long as this isn't messing up known forks in other locations, such Apache process manager doing a fork after the minit/startup process, then it's okay with me.
There other ways to handle this:
- Shut down all threads pre-fork, and re-create them after forking. This would allow us to profile children as well. It would add latency because right now threads only have to reach a synchronization point, not complete their current work and then join.
- Avoid threads completely by sending everything out-of-process with the sidecar. I am not personally confident enough yet in the sidecar to go all-in here (but my confidence in it is increasing, fortunately).
While running a TLDRMacOS is deadlocking on a race condition when one thread is in a The following is going on:
void
_pthread_atfork_prepare_handlers(void)
{
pthread_globals_t globals = _pthread_globals();
_pthread_lock_lock(&globals->pthread_atfork_lock);
size_t idx;
for (idx = globals->atfork_count; idx > 0; --idx) {
struct pthread_atfork_entry *e = &globals->atfork[idx-1];
if (e->prepare != NULL) {
e->prepare();
}
}
} Notice: it is holding a lock on Same time in another thread, let's call it
Remember that the
Why is this not a problem on Linux?
bool multiple_threads = !SINGLE_THREAD_P;
uint64_t lastrun;
lastrun = __run_prefork_handlers (multiple_threads);
if (runp->prepare_handler != NULL)
{
if (do_locking)
lll_unlock (atfork_lock, LLL_PRIVATE);
runp->prepare_handler ();
if (do_locking)
lll_lock (atfork_lock, LLL_PRIVATE);
} This allows other threads to call Is this also true for musl?From the musl implementation it looks like this could be a problem in Linux with musl libc (looking at you Alpine) as well: void __fork_handler(int who)
{
struct atfork_funcs *p;
if (!funcs) return;
if (who < 0) {
LOCK(lock);
for (p=funcs; p; p = p->next) {
if (p->prepare) p->prepare();
funcs = p;
}
} else {
for (p=funcs; p; p = p->prev) {
if (!who && p->parent) p->parent();
else if (who && p->child) p->child();
funcs = p;
}
UNLOCK(lock);
}
}
int pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void))
{
struct atfork_funcs *new = malloc(sizeof *new);
if (!new) return ENOMEM;
LOCK(lock);
new->next = funcs;
new->prev = 0;
new->prepare = prepare;
new->parent = parent;
new->child = child;
if (funcs) funcs->prev = new;
funcs = new;
UNLOCK(lock);
return 0;
} It is holding the lock while executing How do we proceed with this PR?As our main development environment is MacOS, it would be pretty bad to have this merged, as we might deadlock on totally unrelated situations to whatever we are developing in the future and stumble over this problem over and over again thinking we did something wrong, but it is just™ this. We could do it different for MacOS (like we did before) and conditionally compile, which might help. Anyway, this PR needs more thought and work 😉 |
Description
A customer reached out if we do support @swoole in the profiler. Technically we do, but we do not follow forks which @swoole does in
\Swoole\Http\Server
(also when used as a backend to @laravel Octane).Because we do not observe the
fork()
that is happening in a Swoole HTTP server when it is forking new workers, the profiler is in an undefined state in the forked children which could ultimately lead to a worker that is dead locked and not serving requests.This PR aims to observe and shutdown the profiler on any
fork()
call for the child, something we have so far only done for userland calls topcntl_fork()
,pcntl_rfork()
andpcntl_forkx()
. One thing that changes is that we now also "observe" Apache forks: Apache withmpm_prefork
would go throughMINIT
before forking it's worker processes. This is not a problem, as in this case, there is no profiler at all, as we init in first-RINIT
.Reviewer checklist