Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(profiling) switch to pthread_atfork() for fork barrier handling #3058

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

realFlowControl
Copy link
Member

@realFlowControl realFlowControl commented Jan 28, 2025

Description

A customer reached out if we do support @swoole in the profiler. Technically we do, but we do not follow forks which @swoole does in \Swoole\Http\Server (also when used as a backend to @laravel Octane).

Because we do not observe the fork() that is happening in a Swoole HTTP server when it is forking new workers, the profiler is in an undefined state in the forked children which could ultimately lead to a worker that is dead locked and not serving requests.

This PR aims to observe and shutdown the profiler on any fork() call for the child, something we have so far only done for userland calls to pcntl_fork(), pcntl_rfork() and pcntl_forkx(). One thing that changes is that we now also "observe" Apache forks: Apache with mpm_prefork would go through MINIT before forking it's worker processes. This is not a problem, as in this case, there is no profiler at all, as we init in first-RINIT.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

@github-actions github-actions bot added profiling Relates to the Continuous Profiler tracing labels Jan 28, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jan 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.79%. Comparing base (09f23bd) to head (77aedd4).
Report is 1 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3058      +/-   ##
============================================
+ Coverage     73.02%   74.79%   +1.77%     
  Complexity     2787     2787              
============================================
  Files           139      112      -27     
  Lines         15272    11033    -4239     
  Branches       1043        0    -1043     
============================================
- Hits          11152     8252    -2900     
+ Misses         3569     2781     -788     
+ Partials        551        0     -551     
Flag Coverage Δ
appsec-extension ?
tracer-php 74.79% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 27 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09f23bd...77aedd4. Read the comment docs.

@pr-commenter
Copy link

pr-commenter bot commented Jan 28, 2025

Benchmarks [ profiler ]

Benchmark execution time: 2025-01-29 13:14:15

Comparing candidate commit 77aedd4 in PR branch florian/fork-barrier with baseline commit 09f23bd in branch master.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 28 metrics, 7 unstable metrics.

scenario:php-profiler-exceptions-with-profiler

  • 🟥 cpu_user_time [+2.294ms; +7.917ms] or [+3.231%; +11.148%]

@bwoebi
Copy link
Collaborator

bwoebi commented Jan 28, 2025

IIRC this was done in the handlers as other fork() calls may not be in a valid state for the engine to continue / profiling is not even desired in the other process?

@realFlowControl
Copy link
Member Author

IIRC this was done in the handlers as other fork() calls may not be in a valid state for the engine to continue / profiling is not even desired in the other process?

AFAIK the main reason was that Apache is going through MINIT/STARTUP and then forks. That's why we moved most profiler init things to RINIT / first-RINIT and we should be good, but I'll evaluate this.
The reason for this draft (and I'll update the PR description on that) is that Swoole\Http\Server (used in Laravel Octane) forks and we currently do not react to this fork in any way, as it is a native fork() call. Leaving the child in an undefined state, sometimes blocked and not serving requests anymore. I figured it might be better to handle all forks if we reliably can ;-)

@realFlowControl realFlowControl marked this pull request as ready for review January 29, 2025 13:00
@realFlowControl realFlowControl requested a review from a team as a code owner January 29, 2025 13:00
@realFlowControl realFlowControl marked this pull request as draft January 29, 2025 13:00
@realFlowControl realFlowControl marked this pull request as ready for review January 29, 2025 13:16
Copy link
Collaborator

@morrisonlevi morrisonlevi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially very resistant to this change. Florian pointed out this line in the manual:

       After a fork(2) in a multithreaded process returns in the child,
       the child should call only async-signal-safe functions (see
       signal-safety(7)) until such time as it calls execve(2) to
       execute a new program.

And that it means from the point of fork, the child must only call async-signal-safe functions until it calls execve. The restriction is not limited to the child handler portion of the callback, but any code from the fork until exec. This means it's not really any different from our pcntl_fork handler, at least at a distance.

So as long as this isn't messing up known forks in other locations, such Apache process manager doing a fork after the minit/startup process, then it's okay with me.


There other ways to handle this:

  1. Shut down all threads pre-fork, and re-create them after forking. This would allow us to profile children as well. It would add latency because right now threads only have to reach a synchronization point, not complete their current work and then join.
  2. Avoid threads completely by sending everything out-of-process with the sidecar. I am not personally confident enough yet in the sidecar to go all-in here (but my confidence in it is increasing, fortunately).

@realFlowControl
Copy link
Member Author

realFlowControl commented Jan 30, 2025

While running a fork() loop overnight I found a deadlock on MacOS. It occurs when the main thread is in the fork-prepare handler, while at the same time we are sending profiles upstream in the ddprof_upload thread.

TLDR

MacOS is deadlocking on a race condition when one thread is in a pthread_atfork() handler, while another thread is doing name resolution. Linux with glibc is not and double safe from that deadlock, Linux with musl libc is single safe from the deadlock.

The following is going on:

void
_pthread_atfork_prepare_handlers(void)
{
	pthread_globals_t globals = _pthread_globals();

	_pthread_lock_lock(&globals->pthread_atfork_lock);
	size_t idx;
	for (idx = globals->atfork_count; idx > 0; --idx) {
		struct pthread_atfork_entry *e = &globals->atfork[idx-1];
		if (e->prepare != NULL) {
			e->prepare();
		}
	}
}

Notice: it is holding a lock on globals->pthread_atfork_lock while calling into the prepare handlers

Same time in another thread, let's call it ddprof_upload, we are uploading a profile to the backend. More specific, we are sending the serialised version already. Even more precise: we are doing a name server resolution before all the TCP/HTTP shenanigans. While doing the host name lookup, MacOS registers new pthread_atfork() handlers, look at the stack I found:

* thread #6, name = 'tokio-runtime-worker'
  * frame #0: 0x000000019ae78d9c libsystem_kernel.dylib`__ulock_wait2 + 8
    frame #1: 0x000000019aee0aac libsystem_platform.dylib`_os_unfair_lock_lock_slow + 180
    frame #2: 0x000000019aea77b0 libsystem_pthread.dylib`pthread_atfork + 88
    frame #3: 0x000000019af08898 libsystem_info.dylib`si_destination_compare_init_once + 140
    frame #4: 0x000000019aea80c0 libsystem_pthread.dylib`__pthread_once_handler + 76
    frame #5: 0x000000019aee0240 libsystem_platform.dylib`_os_once_callout + 32
    frame #6: 0x000000019aea8058 libsystem_pthread.dylib`pthread_once + 100
    frame #7: 0x000000019af075f4 libsystem_info.dylib`si_destination_compare + 116
    frame #8: 0x000000019ad4c708 libsystem_c.dylib`_isort + 136
    frame #9: 0x000000019aeedfc8 libsystem_info.dylib`_gai_sort_list + 216
    frame #10: 0x000000019aeed090 libsystem_info.dylib`si_addrinfo + 1280
    frame #11: 0x000000019aeecae8 libsystem_info.dylib`getaddrinfo + 168
    frame #12: 0x0000000106b36368 libdatadog_php_profiling.dylib`_$LT$std..sys_common..net..LookupHost$u20$as$u20$core..convert..TryFrom$LT$$LP$$RF$str$C$u16$RP$$GT$$GT$::try_from::_$u7b$$u7b$closure$u7d$$u7d$::h17928286472c438b at net.rs:207:25 [opt]
    frame #13: 0x0000000106b2ebd0 libdatadog_php_profiling.dylib`_$LT$$LP$$RF$str$C$u16$RP$$u20$as$u20$std..net..socket_addr..ToSocketAddrs$GT$::to_socket_addrs::h0a62b5997ae3d025 [inlined] std::sys::pal::common::small_c_string::run_with_cstr_stack::hf59ace9ebe7f5078 at small_c_string.rs:49:18 [opt]
...

Remember that the main thread is holding the lock because it is currently executing the the prepare handler? So here we are:

  • the ddprof_upload thread is waiting for the main thread to release the pthread_atfork_lock so it can add it's atfork handlers
  • while the main thread is waiting for the ddprof_upload thread to arrive at the barrier while it is holding on to the pthread_atfork_lock 😉

Why is this not a problem on Linux?

  bool multiple_threads = !SINGLE_THREAD_P;
  uint64_t lastrun;

  lastrun = __run_prefork_handlers (multiple_threads);
      if (runp->prepare_handler != NULL)
        {
          if (do_locking)
            lll_unlock (atfork_lock, LLL_PRIVATE);

          runp->prepare_handler ();

          if (do_locking)
            lll_lock (atfork_lock, LLL_PRIVATE);
        }

This allows other threads to call pthread_atfork() to add handlers while another thread is already in a prepare handler (it may get wild if you actually do this, but this is living at the edge anyway already, so what gives).

Is this also true for musl?

From the musl implementation it looks like this could be a problem in Linux with musl libc (looking at you Alpine) as well:

void __fork_handler(int who)
{
	struct atfork_funcs *p;
	if (!funcs) return;
	if (who < 0) {
		LOCK(lock);
		for (p=funcs; p; p = p->next) {
			if (p->prepare) p->prepare();
			funcs = p;
		}
	} else {
		for (p=funcs; p; p = p->prev) {
			if (!who && p->parent) p->parent();
			else if (who && p->child) p->child();
			funcs = p;
		}
		UNLOCK(lock);
	}
}

int pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void))
{
	struct atfork_funcs *new = malloc(sizeof *new);
	if (!new) return ENOMEM;

	LOCK(lock);
	new->next = funcs;
	new->prev = 0;
	new->prepare = prepare;
	new->parent = parent;
	new->child = child;
	if (funcs) funcs->prev = new;
	funcs = new;
	UNLOCK(lock);
	return 0;
}

It is holding the lock while executing prepare handlers, so it should deadlock the same way as MacOS does, although it doesn't when playing with the same code 🫤
It is just not a problem, as Linux does not register "default" pthread_atfork() handlers when doing a name resolution. So we are safe on Linux, but racy and deadlocky on MacOS.

How do we proceed with this PR?

As our main development environment is MacOS, it would be pretty bad to have this merged, as we might deadlock on totally unrelated situations to whatever we are developing in the future and stumble over this problem over and over again thinking we did something wrong, but it is just™ this. We could do it different for MacOS (like we did before) and conditionally compile, which might help. Anyway, this PR needs more thought and work 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Relates to the Continuous Profiler tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants