Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi progbar backend #358

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Multi progbar backend #358

wants to merge 9 commits into from

Conversation

mkolopanis
Copy link
Member

Adds tqdm progress bar as a backend option for run_uvdata_uvsim also implements a send and receive protocol for run_uvdata_uvsim where the tasks are chunked and sent out to PUs on request.

Motivation and Context

I like how tqdm looks honestly and was tired of two gross branches continually rebasing. This allows for all backends depending on the user's preference. I think I would prefer having some kind of plugin or something to allow custom progress bars but that sounds much more complicated.

TODO:

  • Show reference sims are identical
  • benchmarking reference sims
  • Documentation/tutorial about backend selection
  • changelog

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Reference simulation update or replacement
  • Documentation change (documentation changes only)
  • Version change
  • Build or continuous integration change

Checklist:

For all pull requests:

New feature checklist:

  • I have added or updated the docstrings associated with my feature using the numpy docstring format.
  • I have updated the documentation to highlight my new feature (if appropriate).
  • I have added tests to cover my new feature.
  • All new and existing tests pass.
  • I have checked that I reproduce the reference simulations or if there are differences they are explained below (if appropriate). If there are changes that are correct, I will update the reference simulation files after this PR is merged.
  • I have checked (e.g., using the benchmarking tools) that this change does not significantly increase typical runtimes. If it does, I have included a justification in the comments on this PR.
  • I have updated the CHANGELOG.

@codecov
Copy link

codecov bot commented Aug 3, 2021

Codecov Report

Attention: Patch coverage is 96.33028% with 8 lines in your changes missing coverage. Please review.

Project coverage is 98.44%. Comparing base (0876a90) to head (f4e5b25).

Files Patch % Lines
src/pyuvsim/uvsim.py 95.72% 8 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##              main     #358      +/-   ##
===========================================
- Coverage   100.00%   98.44%   -1.56%     
===========================================
  Files           11       11              
  Lines         2165     2317     +152     
===========================================
+ Hits          2165     2281     +116     
- Misses           0       36      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pyuvsim/uvsim.py Outdated
Comment on lines 469 to 478
if tag == mpi.tags.READY:
pass
elif tag == mpi.tags.DONE:
uv_container.data_array[msg[1]] += msg[0]
pbar.update(1)

elif tag == mpi.tags.EXIT:
completed_workers += 1
else:
raise ValueError(f"{msg} {tag}")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two lines in here that just won't ever get touched by tests. the READY tag is never sent back from a worker thread, and I think the else catch was a debug thing that just never went away just in case. Ideally the "enum" defined in the mpi module is an exhaustive list of statuses that can be sent back. Would it be preferred to just take out these lines?

@aelanman
Copy link
Contributor

aelanman commented Aug 4, 2021

While I fully support switching to tqdm over progsteps, I'm still not convinced of the push/pull model for task distribution. Has it been shown conclusively that active distribution performs better than the existing system? It looks like it should be capable of about the same performance except with an added overhead from using blocking sends to return visibilities to rank 0.

@mkolopanis
Copy link
Member Author

Definitely part of this endeavor is to just get the technology in the repo right alongside the other things in order to test it better. There were branches before, but it was almost impossible to rebase or something after a new feature was brought in. Even just sitting on this branch, it is much easier to make a comparison.

I think one thing we should do is to get a few really good timing tests. We could always drop the send/recv part in the end.

@mkolopanis mkolopanis force-pushed the multi_progbar_backend branch from 8357105 to de5680a Compare August 11, 2021 20:20
@mkolopanis mkolopanis force-pushed the multi_progbar_backend branch from 3d49b7f to c5ee33d Compare November 3, 2021 16:36
@mkolopanis mkolopanis force-pushed the multi_progbar_backend branch from 3f8888d to 08f25e1 Compare November 12, 2021 17:43
@mkolopanis mkolopanis force-pushed the multi_progbar_backend branch 2 times, most recently from 7daa109 to 186d4ba Compare January 25, 2022 16:58
@mkolopanis mkolopanis force-pushed the multi_progbar_backend branch 2 times, most recently from 47c6ed9 to 34767ae Compare March 21, 2024 21:22
@mkolopanis mkolopanis force-pushed the multi_progbar_backend branch 2 times, most recently from bf28b61 to f87a714 Compare April 1, 2024 17:16
mkolopanis and others added 8 commits June 27, 2024 09:35
move inner progsteps to own function
add in tqdm progressbar, check diffuse analytic, check for npus > 1

add tqdm to 'all' dependency, try to test importerror in min deps

added a two helper functions for chunking iterators

added the send_recv backend which combines with tqdm to distribute tasks

propagate the backend keyword up to the run script

allow all backends to use the profiling option

make analytic diffuse test also run against all backends

import from pyuvsim.uvsim in the import tqdm error test

add tqdm to full test suite, make tests that run with only progsteps

add test to run profiling for all backends.

only make profiling output if uvdata_indices is not empty

remove debug prints, remove thread_multiple as a parameter to run

fix number of arguments to run_uvsim test

define a proper enum for the tags

added tqdm to option list in readme

fixed typo in run_param_pyuvsim argument parser

force ntasks tot to be an int, ceil to be conservative

add a context manger to progsteps

disentangle the progressbar from the backend; test all combinations

revamp rma to not use rank 0, remove messaging equiv to s/r

update tests  to run in paralell. do gymnastics for warnings

have newer tests account for parallel and backend differences

propagate errors on worker nodes to rank0 in send_recv. always clean up the window in rma process.

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

restore mpich test yaml lost to rebase overlords

update progsteps only ci yaml

match versioning on progsteps_only yaml to other ci yamls

break some check from run_uvdata_uvsim into a separate function
@mkolopanis mkolopanis force-pushed the multi_progbar_backend branch from 0d2f1f3 to 984d83d Compare June 27, 2024 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants