Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove parallel call Fixes #7946 #8635

Open
wants to merge 3 commits into
base: release/9.0.2xx
Choose a base branch
from

Conversation

Forgind
Copy link
Member

@Forgind Forgind commented Jan 10, 2025

Fixes #7946

dsplaisted found a stack here that was very helpful in helping me find the source of the parallel call to the workload resolver. As seen in the top stack, this call fairly quickly reaches the WorkloadResolver, where it attempts to initialize everything concurrently, causing the issue. It's called from TemplatePackageCoordinator in the SDK repo, which in turn is called by various things.

I do not think the Parallel.For call in TemplatePackageManager (see #7946 (comment)) is an issue in this case. As I understand it, ScanAsync just looks at a precomputed set of ITemplatePackages, which means it doesn't have to hit the resolver to find them.

@dsplaisted @nagilson

This fairly directly calls into the WorkloadResolver in multiple threads. That isn't thread safe.
@Forgind Forgind requested a review from a team as a code owner January 10, 2025 00:49
@@ -24,7 +24,7 @@ public TemplateConstraintManager(IEngineEnvironmentSettings engineEnvironmentSet
_logger.LogDebug($"Found {constraintFactories.Count()} constraints factories, initializing.");
foreach (var constraintFactory in constraintFactories)
{
_templateConstrains[constraintFactory.Type] = Task.Run(() => constraintFactory.CreateTemplateConstraintAsync(engineEnvironmentSettings, _cancellationTokenSource.Token));
_templateConstrains[constraintFactory.Type] = constraintFactory.CreateTemplateConstraintAsync(engineEnvironmentSettings, _cancellationTokenSource.Token);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been staring at this for a few minutes. My thoughts are... it is still running this asynchronously, since there is no await. The difference is that without Task.Run, it isn't part of a thread pool; it runs each asynchronously on the same thread.

I don't know the exact issue here, but I did notice one thing. The original Task.Run call forgot to pass the cancellation toaken. Meaning, it could have been:

Task.Run(() => constraintFactory.CreateTemplateConstraintAsync(engineEnvironmentSettings, _cancellationTokenSource.Token), _cancellationTokenSource.Token);

Copy link
Member

@nagilson nagilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investigating this change. Sorry for my delay, this was fairly complicated. I think I should have communicated the issue to you better -- you're right and it's a good observation that, that call stack doesn't call the workload constraints. ( I had to try to learn what a 'mount point' was...)

I think the simplest way to make this change (can be done with a few lines) would be to use SemaphoreSlim.WaitAsync. See how to use it here, and put it around this call to the workload resolver:

IEnumerable<WorkloadInfo> currentProviderWorkloads = await providers[0].GetInstalledWorkloadsAsync(token).ConfigureAwait(false);

The semaphore slim is a way to have a lock object that you can run await code inside of the lock. The standard static lock in C# is not friendly with await code, which is why this semaphore slim was designed. You grab the exclusive mutex for this exact purpose. (You may already be familiar with this, but I thought it may be helpful to provide context regardless)

https://stackoverflow.com/questions/7612602/why-cant-i-use-the-await-operator-within-the-body-of-a-lock-statement This article may also be helpful.

I think this approach you're taking may be valid, but the above approach should take about 5 lines or so (Create Mutex, Acquire, Release, {, }), so it would be an easier to read solution with less churn risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants