Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reference implementation for parallel_phase feature #1570

Merged
merged 35 commits into from
Jan 24, 2025

Conversation

isaevil
Copy link
Contributor

@isaevil isaevil commented Nov 26, 2024

Description

Add a comprehensive description of proposed changes

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information

@isaevil isaevil changed the title Add reference implementation of parallel_block feature Add reference implementation for parallel_block feature Nov 26, 2024
@isaevil
Copy link
Contributor Author

isaevil commented Nov 27, 2024

@akukanov @aleksei-fedotov @vossmjp Could you please take a look at the PR in terms of implementation and ABI.

P.S. Tests are still WIP.

Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a complete review, just a couple of starters to think about.

src/tbb/waiters.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
@isaevil isaevil changed the title Add reference implementation for parallel_block feature Add reference implementation for parallel_phase feature Dec 4, 2024
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of comments from me. Have not looked at the tests yet. Also, not finished reviewing the PHASE_* switching logic because of found issue.

include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Show resolved Hide resolved
src/tbb/arena.cpp Outdated Show resolved Hide resolved
src/tbb/arena.h Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/waiters.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
@isaevil isaevil marked this pull request as ready for review December 13, 2024 15:19
src/tbb/arena.cpp Outdated Show resolved Hide resolved
src/tbb/arena.cpp Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
test/tbb/test_task_arena.cpp Outdated Show resolved Hide resolved
test/tbb/test_task_arena.cpp Outdated Show resolved Hide resolved
test/tbb/test_task_arena.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I think need to cover in tests the following:

  • The DELAYED_LEAVE is returned back once new task is submitted.
    • Here I think we can test that the return of a worker after one-time-fast-leave happens generally longer than the following returns made after new task(s) is/are submitted.
  • Nested parallel phases with combinations of DELAYED and (ONE TIME) FAST leaves.

src/tbb/waiters.h Outdated Show resolved Hide resolved
pavelkumbrasev and others added 10 commits December 17, 2024 12:28
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: pavelkumbrasev <pavel.kumbrasev@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
…rallel_block

Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Copy link
Contributor

@akukanov akukanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments below are not critical; feel free to commit as-is and address later.

include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
@@ -95,6 +95,11 @@ TBB_EXPORT void __TBB_EXPORTED_FUNC isolate_within_arena(d1::delegate_base& d, s
TBB_EXPORT void __TBB_EXPORTED_FUNC enqueue(d1::task&, d1::task_arena_base*);
TBB_EXPORT void __TBB_EXPORTED_FUNC enqueue(d1::task&, d1::task_group_context&, d1::task_arena_base*);
TBB_EXPORT void __TBB_EXPORTED_FUNC submit(d1::task&, d1::task_group_context&, arena*, std::uintptr_t);

#if __TBB_PREVIEW_PARALLEL_PHASE
TBB_EXPORT void __TBB_EXPORTED_FUNC register_parallel_phase(d1::task_arena_base*, std::uintptr_t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use "register" and "unregister" in the names for the entry points? To me "registration" does not imply the start and end of the active time in the region, but might be used to indicate that a region just exists. For example, an athlete might register for a race but that doesn't mean they've started running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main idea was to use more generic names, which do not depend on feature API. So, if API will change dramatically, entry point names would still make some sense.

Copy link
Contributor

@akukanov akukanov Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be alternative suitable generic verbs, such as "initiate", "enter", "commence", "launch", "open" for the beginning and "complete", "conclude", "close", "finish", "exit".

From the semantical point of view (i.e. speaking of a program execution phase), I would likely use "enter/exit" or "initiate/complete"; but since it's an internal entry point, I am OK with any choice made, including "register/unregister".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the suggestions, I like "enter/exit".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akukanov @vossmjp Renamed entry points to "enter/exit".

auto median_automatic = utils::median(times_automatic.begin(), times_automatic.end());
auto median_fast = utils::median(times_fast.begin(), times_fast.end());

WARN_MESSAGE(median_automatic < median_fast,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of the tests guaranteed to work (i.e. not have warnings) for hybrid systems where automatic is not delayed leave?

Copy link
Contributor Author

@isaevil isaevil Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. I suppose that only one test case(Parallel Phase retains workers in task_arena) would produce expected results on hybrid systems. Other tests don't make much sense on such configurations, but I think it is fine since these are warnings, not asserts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip the warning on hybrid systems? For example, don't output warning if there is more than 1 core type.

isaevil and others added 2 commits January 15, 2025 15:26
Co-authored-by: Alexey Kukanov <alexey.kukanov@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
@isaevil isaevil force-pushed the dev/pavelkumbrasev/parallel_block branch from c05d9a7 to 515fd33 Compare January 15, 2025 14:34
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
@isaevil
Copy link
Contributor Author

isaevil commented Jan 15, 2025

@aleksei-fedotov @akukanov @vossmjp @pavelkumbrasev @dnmokhov I have updated the RFC document: included some technical details and conditions to leave the experimental stage.

Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
@@ -433,6 +528,9 @@ void arena::advertise_new_work() {
workers_delta = 1;
}

#if __TBB_PREVIEW_PARALLEL_PHASE
my_thread_leave.reset_if_needed();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the following assert be useful?

Suggested change
my_thread_leave.reset_if_needed();
__TBB_ASSERT(mandatory_delta > 0 || workers_delta > 0, "Potentially delaying fast leave?");
my_thread_leave.reset_if_needed();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a valid assert. In case of workerless arena and spawn of the task both of those values will be zeroes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay... let it be then:

Suggested change
my_thread_leave.reset_if_needed();
__TBB_ASSERT((mandatory_delta > 0 || workers_delta > 0) ||
(is_arena_workerless() && work_type == work_spawned),
"Potentially delaying fast leave?");
my_thread_leave.reset_if_needed();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I do not readily see how this assert is related to the parallel phase.
And if you move the assertion next to the code in the lines 522-529, it seems it just duplicates almost exactly the conditions and assignments there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know that it duplicates. But the stress here is on the words "almost exactly". With the assert I sort of wanted to make sure we do not fall into recently discovered issue of dropping the fast leave setting when threads are recalled from arena. I don't mind not having an assert here, since, as you said, the conditions duplicate it. However, I do think we need to tell reader about the desired behavior here, it feels like the code itself might not be good at telling that.

Another interesting consideration is to reduce cross-NUMA accesses as much as possible. If thread can base its decision whether it needs to read and/or modify global-to-arena variables on the data it has already read, then it makes sense to avoid doing more re-reads of other global-to-arena data. Of course, if we are sure that the set of "almost exactly" is empty here, then the thread already avoids unnecessary touches to the data that increases cross-NUMA traffic.

Copy link
Contributor

@akukanov akukanov Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions are good to enforce preconditions for a function, and in some cases of sophisticated internal logic also good to check expected invariants in certain point. Having an assertion that checks the conditions explicitly set by the code right above is a pure duplication. The part that is not an exact duplication could be checked in a proper place, of course; also reset_if_needed could internally assert on its expected preconditions.

For the NUMA part of the comment, I cannot make the connection to the assertion being discussed.

src/tbb/arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Show resolved Hide resolved
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Contributor

@akukanov akukanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that might also simplify the implementation logic.

Below is how. Note that separation of significant bits for different leave states was a necessary prerequisite to operate with ONE_TIME_FAST_LEAVE as a single bit flag.

src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Show resolved Hide resolved
Signed-off-by: Isaev, Ilya <ilya.isaev@intel.com>
__TBB_ASSERT(prev - PARALLEL_PHASE < prev,
"A call to unregister without its register complement");
desired = prev - PARALLEL_PHASE; // Mark the end of this phase in reference counter
if (enable_fast_leave && /*it was the last parallel phase*/desired == DELAYED_LEAVE) {
Copy link
Contributor

@akukanov akukanov Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of nitpicking: the comment does not really match the condition tested. To check if that's the last parallel phase, you'd use desired < PARALLEL_PHASE. Here you also check that neither of the fast-leave flags is set.
Perhaps it can be improved like this:

Suggested change
if (enable_fast_leave && /*it was the last parallel phase*/desired == DELAYED_LEAVE) {
if (enable_fast_leave && /*it was the last parallel phase, and delayed leave is the default*/ desired == DELAYED_LEAVE) {

But also feel free to ignore :)

Copy link
Contributor

@vossmjp vossmjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. At some point we need to improve tests though.

@isaevil isaevil merged commit a9fcf49 into master Jan 24, 2025
25 checks passed
@isaevil isaevil deleted the dev/pavelkumbrasev/parallel_block branch January 24, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants