Skip to content

Distributed PyTorch

Albert Zeyer edited this page Nov 29, 2023 · 5 revisions

Example RETURNN setting: Just put torch_distributed = {} into the config. This will use PyTorch DistributedDataParallel.

In i6_core ReturnnTrainingJob, set horovod_num_processes (name is confusing, it's not about Horovod anymore but also applies to other distribution frameworks) to the number of processes.