You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Albert Zeyer edited this page Nov 29, 2023
·
5 revisions
Example RETURNN setting: Just put torch_distributed = {} into the config. This will use PyTorch DistributedDataParallel.
In i6_core ReturnnTrainingJob, set horovod_num_processes (name is confusing, it's not about Horovod anymore but also applies to other distribution frameworks) to the number of processes.