diff --git a/README.md b/README.md index c54309e..374c9eb 100644 --- a/README.md +++ b/README.md @@ -818,7 +818,7 @@ parser = argparse.ArgumentParser(description='Tuning AlchemyCat MNIST Example') parser.add_argument('-c', '--cfg2tune', type=str) args = parser.parse_args() -# Set `pool_size` to GPU num, will run `pool_size` of configs in parallel +# Will run `torch.cuda.device_count() // work_gpu_num` of configs in parallel runner = Cfg2TuneRunner(args.cfg2tune, experiment_root='/tmp/experiment', work_gpu_num=1) @runner.register_work_fn # How to run config @@ -1021,22 +1021,17 @@ For `config C + algorithm code A ——> reproducible experiment E(C, A)`, meani We also provide a [script](alchemy_cat/torch_tools/scripts/tag_exps.py) that runs `pyhon -m alchemy_cat.torch_tools.scripts.tag_exps -s commit_ID -a commit_ID`, interactively lists the new configs added by the commit, and tags the commit according to the config path. This helps quickly trace back the config and algorithm of a historical experiment. ### Allocate GPU for Child Processes Manually -The `work` function of `Cfg2TuneRunner` sometimes needs to allocate GPUs for subprocesses. Besides using the `cuda_env` parameter, you can manually assign idle GPUs based on `pkl_idx` using the `allocate_cuda_by_group_rank`: +The `work` function receives the idle GPU automatically allocated by `Cfg2TuneRunner` through the `cuda_env` parameter. We can further control the definition of 'idle GPU': ```python -from alchemy_cat.cuda_tools import allocate_cuda_by_group_rank - -# ... Code before - -@runner.register_work_fn # How to run config -def work(pkl_idx: int, cfg: Config, cfg_pkl: str, cfg_rslt_dir: str, cuda_env: dict[str, str]) -> ...: - current_cudas, env_with_current_cuda = allocate_cuda_by_group_rank(group_rank=pkl_idx, group_cuda_num=2, block=True, verbosity=True) - subprocess.run([sys.executable, 'train.py', '-c', cfg_pkl], env=env_with_current_cuda) - -# ... Code after +runner = Cfg2TuneRunner(args.cfg2tune, experiment_root='/tmp/experiment', work_gpu_num=1, + block=True, # Try to allocate idle GPU + memory_need=10 * 1024, # Need 10 GB memory + max_process=2) # Max 2 process already ran on each GPU ``` -`group_rank` commonly is `pkl_idx`, and `group_cuda_num` is the number of GPUs needed for the task. If `block` is `True`, it waits if the GPU is occupied. If `verbosity` is `True`, it prints blocking situations. - -The return value `current_cudas` is a list containing the allocated GPU numbers. `env_with_current_cuda` is an environment variable dictionary with `CUDA_VISIBLE_DEVICES` set, which can be passed directly to the `env` parameter of `subprocess.run`. +where: +- `block`: Defaults is `True`. If set to `False`, GPUs are allocated sequentially, regardless of whether they are idle. +- `memory_need`: The amount of GPU memory required for each sub-config, in MB. The free memory on an idle GPU must be ≥ `memory_need`. Default is `-1.`, indicating need all memory. +- `max_process`: Maximum number of existing processes. The number of existing processes on an idle GPU must be ≤ `max_process`. Default value is `-1`, indicating no limit. ### Pickling Lambda Functions Sub-configs generated by `Cfg2Tune` will be saved using pickle. However, if `Cfg2Tune` defines dependencies as `DEP(lambda c: ...)`, these lambda functions cannot be pickled. Workarounds include: diff --git a/README_CN.md b/README_CN.md index 08d3016..8150259 100644 --- a/README_CN.md +++ b/README_CN.md @@ -817,7 +817,7 @@ parser = argparse.ArgumentParser(description='Tuning AlchemyCat MNIST Example') parser.add_argument('-c', '--cfg2tune', type=str) args = parser.parse_args() -# Set `pool_size` to GPU num, will run `pool_size` of configs in parallel +# Will run `torch.cuda.device_count() // work_gpu_num` of configs in parallel runner = Cfg2TuneRunner(args.cfg2tune, experiment_root='/tmp/experiment', work_gpu_num=1) @runner.register_work_fn # How to run config @@ -1019,23 +1019,19 @@ cfg.sched.epochs = 15 我们还提供了一个[脚本](alchemy_cat/torch_tools/scripts/tag_exps.py),运行`pyhon -m alchemy_cat.torch_tools.scripts.tag_exps -s commit_ID -a commit_ID`,将交互式地列出该 commit 新增的配置,并按照配置路径给 commit 打上标签。这有助于快速回溯历史上某个实验的配置和算法。 -### 为子任务手动分配显卡 -`Cfg2TuneRunner`的`work`函数有时需要为子进程分配显卡。除了使用`cuda_env`参数,还可以使用`allocate_cuda_by_group_rank`,根据`pkl_idx`手动分配空闲显卡,: +### 自动分配空闲显卡 +`work`函数通过`cuda_env`参数,接收`Cfg2TuneRunner`自动分配的空闲显卡。我们还可以进一步控制『空闲显卡』的定义: ```python -from alchemy_cat.cuda_tools import allocate_cuda_by_group_rank - -# ... Code before - -@runner.register_work_fn # How to run config -def work(pkl_idx: int, cfg: Config, cfg_pkl: str, cfg_rslt_dir: str, cuda_env: dict[str, str]) -> ...: - current_cudas, env_with_current_cuda = allocate_cuda_by_group_rank(group_rank=pkl_idx, group_cuda_num=2, block=True, verbosity=True) - subprocess.run([sys.executable, 'train.py', '-c', cfg_pkl], env=env_with_current_cuda) - -# ... Code after +runner = Cfg2TuneRunner(args.cfg2tune, experiment_root='/tmp/experiment', work_gpu_num=1, + block=True, # Try to allocate idle GPU + memory_need=10 * 1024, # Need 10 GB memory + max_process=2) # Max 2 process already ran on each GPU ``` -`group_rank`一般为`pkl_idx`,`group_cuda_num`为任务所需显卡数量。`block`为`True`时,若分配的显卡被占用,会阻塞直到有空闲。`verbosity`为`True`时,会打印阻塞情况。 +其中: +* `block`: 默认为`True`。若为`False`,则总是顺序分配显卡,不考虑空闲与否。 +* `memory_need`:运行每个子配置需要的显存,单位为 MB。空闲显卡之可用显存必须 ≥ `memory_need`。默认值为`-1.`,表示需要独占所有显存。 +* `max_process`:最大已有进程数。空闲显卡已有的进程数必须 ≤ `max_process`。默认值为`-1`,表示无限制。 -返回值`current_cudas`是一个列表,包含了分配的显卡号。`env_with_current_cuda`是设置了`CUDA_VISIBLE_DEVICES`的环境变量字典,可直接传入`subprocess.run`的`env`参数。 ### 匿名函数无法 pickle 问题 `Cfg2Tune`生成的子配置会被 pickle 保存。然而,若`Cfg2Tune`定义了形似`DEP(lambda c: ...)`的依赖项,所存储的匿名函数无法被 pickle。变通方法有: diff --git a/alchemy_cat/dl_config/examples/tune_train.py b/alchemy_cat/dl_config/examples/tune_train.py index d9301c7..58b7b4f 100644 --- a/alchemy_cat/dl_config/examples/tune_train.py +++ b/alchemy_cat/dl_config/examples/tune_train.py @@ -6,7 +6,7 @@ parser.add_argument('-c', '--cfg2tune', type=str) args = parser.parse_args() -# Set `pool_size` to GPU num, will run `pool_size` of configs in parallel +# Will run `torch.cuda.device_count() // work_gpu_num` of configs in parallel runner = Cfg2TuneRunner(args.cfg2tune, experiment_root='/tmp/experiment', work_gpu_num=1) @runner.register_work_fn # How to run config