-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the allocator for Flow Graph critical tasks creating #1596
base: master
Are you sure you want to change the base?
Conversation
TEST_CASE("test critical tasks memory pool correctness") { | ||
using node_type = tbb::flow::function_node<int, tbb::flow::continue_msg>; | ||
constexpr int num_iterations = 10000; | ||
int num_calls = 0; | ||
auto node_body = [&](int) { ++num_calls; }; | ||
|
||
tbb::flow::graph g; | ||
node_type node(g, tbb::flow::serial, node_body, tbb::flow::node_priority_t{1}); | ||
|
||
for (int i = 0; i < num_iterations; ++i) { | ||
node.try_put(i); | ||
} | ||
|
||
g.wait_for_all(); | ||
REQUIRE_MESSAGE(num_calls == num_iterations, "Incorrect number of body executions"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can make the test more lightweight.
How reproducible is this? From what I've understood the task should go into the node's queue in order to reproduce the behavior. Maybe, we can decrease the num_iterations
at the expense of increasing the amount of compute done in node's body using some useless work pattern such as putting thread to sleep or introducing empty loop which is not elided by the compiler. IMHO, that would induce less pressure on the system. But will it still be reproducible enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will investigate if it is possible to change the test to make it more lightweight.
As far as I understand, the root cause of the issue is the aggregator of function_node
and we need to maximize the pressure on the aggregator to increase the chance that one thread will request creating of the task, another thread will create it under the aggregator and return the control to the first thread. So I am not sure that increasing or blocking the node body would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, increasing the amount of work inside of the body will not help to reproduce the issue. It only takes place if we have one thread that enters the aggregator to add new tasks into the queue (as part of try_put
) and another thread entering the aggregator to take the task from the queue (as part of post-processing the body
) and wrapping it into the critical task. And we need the first thread to "win" on the aggregator and process the work from the second. I don't see how we can increase the probability of such a behavior using the Flow Graph public API.
I think the current test is OK since it provides constant simultaneous adding items to the queue and taking items from it (because of the lightweight body). As I see, the test reproduces the issue in ~80% of runs.
d1::small_object_allocator allocator; | ||
d1::task* critical_task = allocator.new_object<priority_task_selector>(g.my_priority_queue, allocator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least I would add a comment why it is necessary, explaining why graph_task's allocator cannot be used here. After all, its purpose is to cache allocations. BTW, if it cannot be used, do we need to have it in graph_task
at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I agree that the comment is required.
Regarding the necessity of the allocator inside of the graph_task
, it is required not only for allocating the critical tasks, but also for deallocation of the graph_task
itself after the execution (even in the case without critical tasks).
Co-authored-by: Aleksei Fedotov <aleksei.fedotov@intel.com>
Description
There is a bug in the current implementation of Flow Graph while using functional node concurrency limits together with node priorities.
Current algorithm for creating tasks in function node is the following:
Also this patch adds extra assert for small object pool to ensure that the pool, for which the allocation is requested is the same as the TLS pool.
Fixes #1595
Type of change
Choose one or multiple, leave empty if none of the other choices apply
Add a respective label(s) to PR if you have permissions
Tests
Documentation
Breaks backward compatibility
Notify the following users
List users with
@
to send notificationsOther information