Skip to content

Commit

Permalink
docs: Added note about GBS and jsonl samples to DPO tutorial (NVIDIA#345
Browse files Browse the repository at this point in the history
)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
  • Loading branch information
trias702 and terrykong authored Oct 16, 2024
1 parent eac6084 commit d8ef9fb
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/user-guide/dpo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ However, please be aware that most Megatron GPT models adhere to a strict format

Always follow the prompt-response template format used during your SFT training for DPO, as failure to do so will produce a model which outputs garbage text. You should create one jsonl file in the format above for your training data and one jsonl for your validation data.

Your JSONL file must contain at least as many samples as the Global Batch Size (GBS) you plan to use during training. For example, if GBS = 64, ensure that both your training and validation files include at least 64 samples. Using a file with fewer samples than the GBS will result in a crash.

Once your data is processed into the correct format, you are ready to begin DPO training. You must start with a pretrained or SFT trained model. For this section, we will use the SFT model trained in the previous step to train the DPO model.
For the purposes of the following sections, we assume that your training jsonl file is located in ``/path/to/train_dpo_format.jsonl`` and your validation jsonl file is located in ``/path/to/valid_dpo_format.jsonl``.

Expand Down

0 comments on commit d8ef9fb

Please sign in to comment.