fix(profiling) switch to `pthread_atfork()` for fork barrier handling #3058

realFlowControl · 2025-01-28T18:12:34Z

Description

A customer reached out if we do support @swoole in the profiler. Technically we do, but we do not follow forks which @swoole does in \Swoole\Http\Server (also when used as a backend to @laravel Octane).

Because we do not observe the fork() that is happening in a Swoole HTTP server when it is forking new workers, the profiler is in an undefined state in the forked children which could ultimately lead to a worker that is dead locked and not serving requests.

This PR aims to observe and shutdown the profiler on any fork() call for the child, something we have so far only done for userland calls to pcntl_fork(), pcntl_rfork() and pcntl_forkx(). One thing that changes is that we now also "observe" Apache forks: Apache with mpm_prefork would go through MINIT before forking it's worker processes. This is not a problem, as in this case, there is no profiler at all, as we init in first-RINIT.

Reviewer checklist

Test coverage seems ok.
Appropriate labels assigned.

codecov-commenter · 2025-01-28T18:18:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.79%. Comparing base (09f23bd) to head (77aedd4).
Report is 1 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #3058      +/-   ##
============================================
+ Coverage     73.02%   74.79%   +1.77%     
  Complexity     2787     2787              
============================================
  Files           139      112      -27     
  Lines         15272    11033    -4239     
  Branches       1043        0    -1043     
============================================
- Hits          11152     8252    -2900     
+ Misses         3569     2781     -788     
+ Partials        551        0     -551

Flag	Coverage Δ
appsec-extension	`?`
tracer-php	`74.79% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 27 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09f23bd...77aedd4. Read the comment docs.

pr-commenter · 2025-01-28T18:24:18Z

Benchmarks [ profiler ]

Benchmark execution time: 2025-01-29 13:14:15

Comparing candidate commit 77aedd4 in PR branch florian/fork-barrier with baseline commit 09f23bd in branch master.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 28 metrics, 7 unstable metrics.

scenario:php-profiler-exceptions-with-profiler

🟥 cpu_user_time [+2.294ms; +7.917ms] or [+3.231%; +11.148%]

bwoebi · 2025-01-28T19:15:54Z

IIRC this was done in the handlers as other fork() calls may not be in a valid state for the engine to continue / profiling is not even desired in the other process?

realFlowControl · 2025-01-29T08:37:49Z

IIRC this was done in the handlers as other fork() calls may not be in a valid state for the engine to continue / profiling is not even desired in the other process?

AFAIK the main reason was that Apache is going through MINIT/STARTUP and then forks. That's why we moved most profiler init things to RINIT / first-RINIT and we should be good, but I'll evaluate this.
The reason for this draft (and I'll update the PR description on that) is that Swoole\Http\Server (used in Laravel Octane) forks and we currently do not react to this fork in any way, as it is a native fork() call. Leaving the child in an undefined state, sometimes blocked and not serving requests anymore. I figured it might be better to handle all forks if we reliably can ;-)

morrisonlevi

I was initially very resistant to this change. Florian pointed out this line in the manual:

       After a fork(2) in a multithreaded process returns in the child,
       the child should call only async-signal-safe functions (see
       signal-safety(7)) until such time as it calls execve(2) to
       execute a new program.

And that it means from the point of fork, the child must only call async-signal-safe functions until it calls execve. The restriction is not limited to the child handler portion of the callback, but any code from the fork until exec. This means it's not really any different from our pcntl_fork handler, at least at a distance.

So as long as this isn't messing up known forks in other locations, such Apache process manager doing a fork after the minit/startup process, then it's okay with me.

There other ways to handle this:

Shut down all threads pre-fork, and re-create them after forking. This would allow us to profile children as well. It would add latency because right now threads only have to reach a synchronization point, not complete their current work and then join.
Avoid threads completely by sending everything out-of-process with the sidecar. I am not personally confident enough yet in the sidecar to go all-in here (but my confidence in it is increasing, fortunately).

realFlowControl · 2025-01-30T09:57:11Z

While running a fork() loop overnight I found a deadlock on MacOS. It occurs when the main thread is in the fork-prepare handler, while at the same time we are sending profiles upstream in the ddprof_upload thread.

TLDR

MacOS is deadlocking on a race condition when one thread is in a pthread_atfork() handler, while another thread is doing name resolution. Linux with glibc is not and double safe from that deadlock, Linux with musl libc is single safe from the deadlock.

The following is going on:

the main thread is in the prepare handler passed to the pthread_atfork() call
to get there, MacOS libc calls into _pthread_atfork_prepare_handlers() which looks like this

void
_pthread_atfork_prepare_handlers(void)
{
	pthread_globals_t globals = _pthread_globals();

	_pthread_lock_lock(&globals->pthread_atfork_lock);
	size_t idx;
	for (idx = globals->atfork_count; idx > 0; --idx) {
		struct pthread_atfork_entry *e = &globals->atfork[idx-1];
		if (e->prepare != NULL) {
			e->prepare();
		}
	}
}

Notice: it is holding a lock on globals->pthread_atfork_lock while calling into the prepare handlers

Same time in another thread, let's call it ddprof_upload, we are uploading a profile to the backend. More specific, we are sending the serialised version already. Even more precise: we are doing a name server resolution before all the TCP/HTTP shenanigans. While doing the host name lookup, MacOS registers new pthread_atfork() handlers, look at the stack I found:

* thread #6, name = 'tokio-runtime-worker'
  * frame #0: 0x000000019ae78d9c libsystem_kernel.dylib`__ulock_wait2 + 8
    frame #1: 0x000000019aee0aac libsystem_platform.dylib`_os_unfair_lock_lock_slow + 180
    frame #2: 0x000000019aea77b0 libsystem_pthread.dylib`pthread_atfork + 88
    frame #3: 0x000000019af08898 libsystem_info.dylib`si_destination_compare_init_once + 140
    frame #4: 0x000000019aea80c0 libsystem_pthread.dylib`__pthread_once_handler + 76
    frame #5: 0x000000019aee0240 libsystem_platform.dylib`_os_once_callout + 32
    frame #6: 0x000000019aea8058 libsystem_pthread.dylib`pthread_once + 100
    frame #7: 0x000000019af075f4 libsystem_info.dylib`si_destination_compare + 116
    frame #8: 0x000000019ad4c708 libsystem_c.dylib`_isort + 136
    frame #9: 0x000000019aeedfc8 libsystem_info.dylib`_gai_sort_list + 216
    frame #10: 0x000000019aeed090 libsystem_info.dylib`si_addrinfo + 1280
    frame #11: 0x000000019aeecae8 libsystem_info.dylib`getaddrinfo + 168
    frame #12: 0x0000000106b36368 libdatadog_php_profiling.dylib`_$LT$std..sys_common..net..LookupHost$u20$as$u20$core..convert..TryFrom$LT$$LP$$RF$str$C$u16$RP$$GT$$GT$::try_from::_$u7b$$u7b$closure$u7d$$u7d$::h17928286472c438b at net.rs:207:25 [opt]
    frame #13: 0x0000000106b2ebd0 libdatadog_php_profiling.dylib`_$LT$$LP$$RF$str$C$u16$RP$$u20$as$u20$std..net..socket_addr..ToSocketAddrs$GT$::to_socket_addrs::h0a62b5997ae3d025 [inlined] std::sys::pal::common::small_c_string::run_with_cstr_stack::hf59ace9ebe7f5078 at small_c_string.rs:49:18 [opt]
...

Remember that the main thread is holding the lock because it is currently executing the the prepare handler? So here we are:

the ddprof_upload thread is waiting for the main thread to release the pthread_atfork_lock so it can add it's atfork handlers
while the main thread is waiting for the ddprof_upload thread to arrive at the barrier while it is holding on to the pthread_atfork_lock 😉

Why is this not a problem on Linux?

the glibc fork() will determine if the env is multithreaded and call a helper to iterate over prepare handlers

  bool multiple_threads = !SINGLE_THREAD_P;
  uint64_t lastrun;

  lastrun = __run_prefork_handlers (multiple_threads);

the __run_prefork_handlers() get's called with a truthy value
it will iterate over the prepare handlers and release the lock for the prepare handler list before calling the prepare handler and then accquire the lock again after its return:

      if (runp->prepare_handler != NULL)
        {
          if (do_locking)
            lll_unlock (atfork_lock, LLL_PRIVATE);

          runp->prepare_handler ();

          if (do_locking)
            lll_lock (atfork_lock, LLL_PRIVATE);
        }

This allows other threads to call pthread_atfork() to add handlers while another thread is already in a prepare handler (it may get wild if you actually do this, but this is living at the edge anyway already, so what gives).

Is this also true for musl?

From the musl implementation it looks like this could be a problem in Linux with musl libc (looking at you Alpine) as well:

void __fork_handler(int who)
{
	struct atfork_funcs *p;
	if (!funcs) return;
	if (who < 0) {
		LOCK(lock);
		for (p=funcs; p; p = p->next) {
			if (p->prepare) p->prepare();
			funcs = p;
		}
	} else {
		for (p=funcs; p; p = p->prev) {
			if (!who && p->parent) p->parent();
			else if (who && p->child) p->child();
			funcs = p;
		}
		UNLOCK(lock);
	}
}

int pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void))
{
	struct atfork_funcs *new = malloc(sizeof *new);
	if (!new) return ENOMEM;

	LOCK(lock);
	new->next = funcs;
	new->prev = 0;
	new->prepare = prepare;
	new->parent = parent;
	new->child = child;
	if (funcs) funcs->prev = new;
	funcs = new;
	UNLOCK(lock);
	return 0;
}

It is holding the lock while executing prepare handlers, so it should deadlock the same way as MacOS does, although it doesn't when playing with the same code 🫤
It is just not a problem, as Linux does not register "default" pthread_atfork() handlers when doing a name resolution. So we are safe on Linux, but racy and deadlocky on MacOS.

How do we proceed with this PR?

As our main development environment is MacOS, it would be pretty bad to have this merged, as we might deadlock on totally unrelated situations to whatever we are developing in the future and stumble over this problem over and over again thinking we did something wrong, but it is just™ this. We could do it different for MacOS (like we did before) and conditionally compile, which might help. Anyway, this PR needs more thought and work 😉

github-actions bot added profiling Relates to the Continuous Profiler tracing labels Jan 28, 2025

realFlowControl marked this pull request as ready for review January 29, 2025 13:00

realFlowControl requested a review from a team as a code owner January 29, 2025 13:00

realFlowControl marked this pull request as draft January 29, 2025 13:00

switch to pthread_atfork() for fork barrier handling

77aedd4

realFlowControl force-pushed the florian/fork-barrier branch from 1616f46 to 77aedd4 Compare January 29, 2025 13:01

realFlowControl marked this pull request as ready for review January 29, 2025 13:16

morrisonlevi approved these changes Jan 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(profiling) switch to `pthread_atfork()` for fork barrier handling #3058

fix(profiling) switch to `pthread_atfork()` for fork barrier handling #3058

realFlowControl commented Jan 28, 2025 •

edited

Loading

codecov-commenter commented Jan 28, 2025 •

edited

Loading

pr-commenter bot commented Jan 28, 2025 •

edited

Loading

bwoebi commented Jan 28, 2025

realFlowControl commented Jan 29, 2025

morrisonlevi left a comment •

edited

Loading

realFlowControl commented Jan 30, 2025 •

edited

Loading

fix(profiling) switch to pthread_atfork() for fork barrier handling #3058

Are you sure you want to change the base?

fix(profiling) switch to pthread_atfork() for fork barrier handling #3058

Conversation

realFlowControl commented Jan 28, 2025 • edited Loading

Description

Reviewer checklist

codecov-commenter commented Jan 28, 2025 • edited Loading

Codecov Report

pr-commenter bot commented Jan 28, 2025 • edited Loading

Benchmarks [ profiler ]

scenario:php-profiler-exceptions-with-profiler

bwoebi commented Jan 28, 2025

realFlowControl commented Jan 29, 2025

morrisonlevi left a comment • edited Loading

Choose a reason for hiding this comment

realFlowControl commented Jan 30, 2025 • edited Loading

TLDR

The following is going on:

Why is this not a problem on Linux?

Is this also true for musl?

How do we proceed with this PR?

fix(profiling) switch to `pthread_atfork()` for fork barrier handling #3058

fix(profiling) switch to `pthread_atfork()` for fork barrier handling #3058

realFlowControl commented Jan 28, 2025 •

edited

Loading

codecov-commenter commented Jan 28, 2025 •

edited

Loading

pr-commenter bot commented Jan 28, 2025 •

edited

Loading

morrisonlevi left a comment •

edited

Loading

realFlowControl commented Jan 30, 2025 •

edited

Loading