Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ci): Balance splits across benchmarking CI jobs according to the number of CPU cores #5099

Merged
merged 4 commits into from
Jan 13, 2025

Conversation

ddyurchenko
Copy link
Contributor

@ddyurchenko ddyurchenko commented Jan 13, 2025

What does this PR do?

Fix splits of benchmarking CI jobs, so now it behaves properly according to the number of CPU cores, and also fix several edge cases in original runall.sh script.

  • Fix how balancing over CPU cores works, so now tasks are split by variants instead of only directories;
  • Fix bug with taskset trying to assign to non-existing core;
  • Fix bug with off by 1 error in benchmarks count;
  • Fail job with exit code 1 if sirun could not start it;
  • Fail CI job when GROUP_SIZE is larger than number of CPU cores.

Motivation

I noticed that exporting-pipeline benchmark was not executed after introduction of new benchmarks in PR #5004. After debugging, I discovered some bugs in original implementation of jobs split.

Copy link

github-actions bot commented Jan 13, 2025

Overall package size

Self size: 8.39 MB
Deduped: 94.74 MB
No deduping: 95.25 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.3.0 | 29.43 MB | 29.43 MB | | @datadog/native-appsec | 8.4.0 | 19.25 MB | 19.26 MB | | @datadog/native-iast-taint-tracking | 3.2.0 | 13.9 MB | 13.91 MB | | @datadog/pprof | 5.4.1 | 9.76 MB | 10.13 MB | | protobufjs | 7.2.5 | 2.77 MB | 5.16 MB | | @datadog/native-iast-rewriter | 2.6.1 | 2.59 MB | 2.73 MB | | @opentelemetry/core | 1.14.0 | 872.87 kB | 1.47 MB | | @datadog/native-metrics | 3.1.0 | 1.06 MB | 1.46 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.11.2 | 112.74 kB | 826.22 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | lru-cache | 7.18.3 | 133.92 kB | 133.92 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.0 | 109.9 kB | 109.9 kB | | semver | 7.6.3 | 95.82 kB | 95.82 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 5.3.1 | 51.46 kB | 51.46 kB | | shell-quote | 1.8.1 | 44.96 kB | 44.96 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | rfdc | 1.3.1 | 25.21 kB | 25.21 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | dc-polyfill | 0.1.4 | 23.1 kB | 23.1 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

…roperly

* Fix how balancing over CPU cores works, so now tasks are split by variants instead of only directories
* Fix bug with taskset trying to assign to non-existing core
* Fix bug with off by 1 error in benchmarks count
* Fail job with exit code 1 if sirun could not start it
@ddyurchenko ddyurchenko force-pushed the ddyurchenko/fix-splits branch from 0abafc9 to 8b32d56 Compare January 13, 2025 12:39
@pr-commenter
Copy link

pr-commenter bot commented Jan 13, 2025

Benchmarks

Benchmark execution time: 2025-01-13 15:33:58

Comparing candidate commit 1658a3f in PR branch ddyurchenko/fix-splits with baseline commit 587957e in branch master.

Found 0 performance improvements and 14 performance regressions! Performance is the same for 809 metrics, 20 unstable metrics.

scenario:plugin-http-server-control-18

  • 🟥 cpu_user_time [+13.456ms; +20.551ms] or [+5.853%; +8.939%]
  • 🟥 execution_time [+28.578ms; +30.691ms] or [+5.310%; +5.702%]

scenario:scope-manager-async_hooks-18

  • 🟥 instructions [+24.2M instructions; +31.1M instructions] or [+10.297%; +13.274%]

scenario:scope-manager-async_hooks-20

  • 🟥 instructions [+21.7M instructions; +29.3M instructions] or [+11.879%; +16.063%]

scenario:scope-manager-async_hooks-22

  • 🟥 instructions [+20.4M instructions; +27.3M instructions] or [+10.828%; +14.460%]

scenario:scope-manager-async_local_storage-18

  • 🟥 instructions [+24.5M instructions; +31.2M instructions] or [+10.426%; +13.282%]

scenario:scope-manager-async_local_storage-20

  • 🟥 instructions [+21.8M instructions; +29.1M instructions] or [+11.971%; +15.969%]

scenario:scope-manager-async_local_storage-22

  • 🟥 instructions [+20.3M instructions; +26.5M instructions] or [+10.748%; +14.022%]

scenario:scope-manager-async_resource-18

  • 🟥 instructions [+24.2M instructions; +31.2M instructions] or [+10.344%; +13.322%]

scenario:scope-manager-async_resource-20

  • 🟥 instructions [+22.0M instructions; +29.9M instructions] or [+12.068%; +16.429%]

scenario:scope-manager-async_resource-22

  • 🟥 instructions [+19.8M instructions; +26.2M instructions] or [+10.472%; +13.891%]

scenario:scope-manager-base-18

  • 🟥 instructions [+23.8M instructions; +30.7M instructions] or [+10.142%; +13.087%]

scenario:scope-manager-base-20

  • 🟥 instructions [+21.8M instructions; +29.6M instructions] or [+11.939%; +16.267%]

scenario:scope-manager-base-22

  • 🟥 instructions [+19.9M instructions; +26.8M instructions] or [+10.554%; +14.205%]

@ddyurchenko ddyurchenko changed the title fix: Balance splits across CI jobs according to number of CPU cores correctly fix(ci): Balance splits across benchmarking CI jobs according to the number of CPU cores Jan 13, 2025
@ddyurchenko
Copy link
Contributor Author

In regards to the above performance PR comment, it is invalid.

Since splits procedure between old and new runall.sh script differs, we ended up with some benchmarks executed on different cores, thus seeing performance difference here.

See #5101 to confirm that after PR merge, things are in order.

@ddyurchenko ddyurchenko marked this pull request as ready for review January 13, 2025 13:33
@ddyurchenko ddyurchenko requested a review from a team as a code owner January 13, 2025 13:33
@ddyurchenko ddyurchenko requested review from watson and rochdev January 13, 2025 13:35
benchmark/sirun/runall.sh Outdated Show resolved Hide resolved
benchmark/sirun/runall.sh Outdated Show resolved Hide resolved
watson
watson previously approved these changes Jan 13, 2025
@ddyurchenko ddyurchenko merged commit 435109b into master Jan 13, 2025
329 checks passed
@ddyurchenko ddyurchenko deleted the ddyurchenko/fix-splits branch January 13, 2025 15:49
watson pushed a commit that referenced this pull request Jan 22, 2025
…number of CPU cores (#5099)

* Fix how balancing over CPU cores works, so now tasks are split by variants instead of only directories
* Fix bug with taskset trying to assign to non-existing core
* Fix bug with off by 1 error in benchmarks count
* Fail job with exit code 1 if sirun could not start it
* Fail CI job when GROUP_SIZE is larger than number of CPU cores
@watson watson mentioned this pull request Jan 22, 2025
watson pushed a commit that referenced this pull request Jan 22, 2025
…number of CPU cores (#5099)

* Fix how balancing over CPU cores works, so now tasks are split by variants instead of only directories
* Fix bug with taskset trying to assign to non-existing core
* Fix bug with off by 1 error in benchmarks count
* Fail job with exit code 1 if sirun could not start it
* Fail CI job when GROUP_SIZE is larger than number of CPU cores
@watson watson mentioned this pull request Jan 22, 2025
watson pushed a commit that referenced this pull request Jan 23, 2025
…number of CPU cores (#5099)

* Fix how balancing over CPU cores works, so now tasks are split by variants instead of only directories
* Fix bug with taskset trying to assign to non-existing core
* Fix bug with off by 1 error in benchmarks count
* Fail job with exit code 1 if sirun could not start it
* Fail CI job when GROUP_SIZE is larger than number of CPU cores
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants