diff --git a/README.md b/README.md
index c54309e..374c9eb 100644
--- a/README.md
+++ b/README.md
@@ -818,7 +818,7 @@ parser = argparse.ArgumentParser(description='Tuning AlchemyCat MNIST Example')
 parser.add_argument('-c', '--cfg2tune', type=str)
 args = parser.parse_args()
 
-# Set `pool_size` to GPU num, will run `pool_size` of configs in parallel
+# Will run `torch.cuda.device_count() // work_gpu_num`  of configs in parallel
 runner = Cfg2TuneRunner(args.cfg2tune, experiment_root='/tmp/experiment', work_gpu_num=1)
 
 @runner.register_work_fn  # How to run config
@@ -1021,22 +1021,17 @@ For `config C + algorithm code A ——> reproducible experiment E(C, A)`, meani
 We also provide a [script](alchemy_cat/torch_tools/scripts/tag_exps.py) that runs `pyhon -m alchemy_cat.torch_tools.scripts.tag_exps -s commit_ID -a commit_ID`, interactively lists the new configs added by the commit, and tags the commit according to the config path. This helps quickly trace back the config and algorithm of a historical experiment.
 
 ### Allocate GPU for Child Processes Manually
-The `work` function of `Cfg2TuneRunner` sometimes needs to allocate GPUs for subprocesses. Besides using the `cuda_env` parameter, you can manually assign idle GPUs based on `pkl_idx` using the `allocate_cuda_by_group_rank`:
+The `work` function receives the idle GPU automatically allocated by `Cfg2TuneRunner` through the `cuda_env` parameter. We can further control the definition of 'idle GPU':
 ```python
-from alchemy_cat.cuda_tools import allocate_cuda_by_group_rank
-
-# ... Code before
-
-@runner.register_work_fn  # How to run config
-def work(pkl_idx: int, cfg: Config, cfg_pkl: str, cfg_rslt_dir: str, cuda_env: dict[str, str]) -> ...:
-    current_cudas, env_with_current_cuda = allocate_cuda_by_group_rank(group_rank=pkl_idx, group_cuda_num=2, block=True, verbosity=True)
-    subprocess.run([sys.executable, 'train.py', '-c', cfg_pkl], env=env_with_current_cuda)
-
-# ... Code after
+runner = Cfg2TuneRunner(args.cfg2tune, experiment_root='/tmp/experiment', work_gpu_num=1, 
+                        block=True,             # Try to allocate idle GPU
+                        memory_need=10 * 1024,  # Need 10 GB memory
+                        max_process=2)          # Max 2 process already ran on each GPU
 ```
-`group_rank` commonly is `pkl_idx`, and `group_cuda_num` is the number of GPUs needed for the task. If `block` is `True`, it waits if the GPU is occupied. If `verbosity` is `True`, it prints blocking situations.
-
-The return value `current_cudas` is a list containing the allocated GPU numbers. `env_with_current_cuda` is an environment variable dictionary with `CUDA_VISIBLE_DEVICES` set, which can be passed directly to the `env` parameter of `subprocess.run`.
+where:
+- `block`: Defaults is `True`. If set to `False`, GPUs are allocated sequentially, regardless of whether they are idle.
+- `memory_need`: The amount of GPU memory required for each sub-config, in MB. The free memory on an idle GPU must be ≥ `memory_need`. Default is `-1.`, indicating need all memory.
+- `max_process`: Maximum number of existing processes. The number of existing processes on an idle GPU must be ≤ `max_process`. Default value is `-1`, indicating no limit.
 
 ### Pickling Lambda Functions
 Sub-configs generated by `Cfg2Tune` will be saved using pickle. However, if `Cfg2Tune` defines dependencies as `DEP(lambda c: ...)`, these lambda functions cannot be pickled. Workarounds include:
diff --git a/README_CN.md b/README_CN.md
index 08d3016..8150259 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -817,7 +817,7 @@ parser = argparse.ArgumentParser(description='Tuning AlchemyCat MNIST Example')
 parser.add_argument('-c', '--cfg2tune', type=str)
 args = parser.parse_args()
 
-# Set `pool_size` to GPU num, will run `pool_size` of configs in parallel
+# Will run `torch.cuda.device_count() // work_gpu_num`  of configs in parallel
 runner = Cfg2TuneRunner(args.cfg2tune, experiment_root='/tmp/experiment', work_gpu_num=1)
 
 @runner.register_work_fn  # How to run config
@@ -1019,23 +1019,19 @@ cfg.sched.epochs = 15
 
 我们还提供了一个[脚本](alchemy_cat/torch_tools/scripts/tag_exps.py)，运行`pyhon -m alchemy_cat.torch_tools.scripts.tag_exps -s commit_ID -a commit_ID`，将交互式地列出该 commit 新增的配置，并按照配置路径给 commit 打上标签。这有助于快速回溯历史上某个实验的配置和算法。
 
-### 为子任务手动分配显卡
-`Cfg2TuneRunner`的`work`函数有时需要为子进程分配显卡。除了使用`cuda_env`参数，还可以使用`allocate_cuda_by_group_rank`，根据`pkl_idx`手动分配空闲显卡，：
+### 自动分配空闲显卡
+`work`函数通过`cuda_env`参数，接收`Cfg2TuneRunner`自动分配的空闲显卡。我们还可以进一步控制『空闲显卡』的定义：
 ```python
-from alchemy_cat.cuda_tools import allocate_cuda_by_group_rank
-
-# ... Code before
-
-@runner.register_work_fn  # How to run config
-def work(pkl_idx: int, cfg: Config, cfg_pkl: str, cfg_rslt_dir: str, cuda_env: dict[str, str]) -> ...:
-    current_cudas, env_with_current_cuda = allocate_cuda_by_group_rank(group_rank=pkl_idx, group_cuda_num=2, block=True, verbosity=True)
-    subprocess.run([sys.executable, 'train.py', '-c', cfg_pkl], env=env_with_current_cuda)
-
-# ... Code after
+runner = Cfg2TuneRunner(args.cfg2tune, experiment_root='/tmp/experiment', work_gpu_num=1, 
+                        block=True,             # Try to allocate idle GPU
+                        memory_need=10 * 1024,  # Need 10 GB memory
+                        max_process=2)          # Max 2 process already ran on each GPU
 ```
-`group_rank`一般为`pkl_idx`，`group_cuda_num`为任务所需显卡数量。`block`为`True`时，若分配的显卡被占用，会阻塞直到有空闲。`verbosity`为`True`时，会打印阻塞情况。
+其中：
+* `block`: 默认为`True`。若为`False`，则总是顺序分配显卡，不考虑空闲与否。
+* `memory_need`：运行每个子配置需要的显存，单位为 MB。空闲显卡之可用显存必须 ≥ `memory_need`。默认值为`-1.`，表示需要独占所有显存。
+* `max_process`：最大已有进程数。空闲显卡已有的进程数必须 ≤ `max_process`。默认值为`-1`，表示无限制。
 
-返回值`current_cudas`是一个列表，包含了分配的显卡号。`env_with_current_cuda`是设置了`CUDA_VISIBLE_DEVICES`的环境变量字典，可直接传入`subprocess.run`的`env`参数。
 
 ### 匿名函数无法 pickle 问题
 `Cfg2Tune`生成的子配置会被 pickle 保存。然而，若`Cfg2Tune`定义了形似`DEP(lambda c: ...)`的依赖项，所存储的匿名函数无法被 pickle。变通方法有：
diff --git a/alchemy_cat/dl_config/examples/tune_train.py b/alchemy_cat/dl_config/examples/tune_train.py
index d9301c7..58b7b4f 100644
--- a/alchemy_cat/dl_config/examples/tune_train.py
+++ b/alchemy_cat/dl_config/examples/tune_train.py
@@ -6,7 +6,7 @@
 parser.add_argument('-c', '--cfg2tune', type=str)
 args = parser.parse_args()
 
-# Set `pool_size` to GPU num, will run `pool_size` of configs in parallel
+# Will run `torch.cuda.device_count() // work_gpu_num`  of configs in parallel
 runner = Cfg2TuneRunner(args.cfg2tune, experiment_root='/tmp/experiment', work_gpu_num=1)
 
 @runner.register_work_fn  # How to run config