Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] LMFlow Roadmap #862

Open
8 of 33 tasks
wheresmyhair opened this issue Jun 19, 2024 · 2 comments
Open
8 of 33 tasks

[Roadmap] LMFlow Roadmap #862

wheresmyhair opened this issue Jun 19, 2024 · 2 comments

Comments

@wheresmyhair
Copy link
Collaborator

wheresmyhair commented Jun 19, 2024

This document includes the features in LMFlow's roadmap. We welcome any discuss or contribute to the specific features at related Issues/PRs. 🤗

Main Features

Usability

  • Make some packages/functions (gradio, vllm, ray, etc.) optional, add conditional import. [usability] deps streamlining #905
  • Inference method auto-downgrading (vllm>ds, etc.), and make vllm package optional
  • Merging similar model methods into hf_model_mixin
  • Set torch_dtype='bfloat16' when bf16 is specified, etc. (bf16 is in FinetunerArguments but torch_dtype is in ModelArguments, thus cannot handle in __post_init__(). )

Bug fixes

Issues left over from history

  • use_accelerator -> use_accelerate typo fix (with Accelerate support PR)
  • model_args.use_lora leads to truncation of the sequence, mentioned in [Feature] reward model inferencer and dpov2 aligner #867
  • Make ports, addresses, and all other settings in distributed training tidy and clear (with Accelerate support PR)

Documentation

  • Approx GPU memory requirement w.r.t model size & pipeline
  • Dev handbook, indicating styles, test list, etc.
@wheresmyhair wheresmyhair pinned this issue Jun 20, 2024
@wheresmyhair
Copy link
Collaborator Author

wheresmyhair commented Jun 25, 2024

Note on multiple instances inference:
In vllm inference, the number of attn heads should be devisible by vllm tensor parallel size. If we have a 14 heads LLM, then the options for tp is 1&2 (7 will cause another division issue, but I just forget what that issue is).
Say we have 8 gpus, then to utilize these devices, multiple instances vllm inference is necessary (tp=1 -> 8 instances, and tp=2 -> 4 instances)
Also, same for rm inference, and any other inference pipelines.

@wheresmyhair
Copy link
Collaborator Author

Now supports Iterative DPO #883

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant