Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the integration of outlines-core in vLLM #145

Open
rlouf opened this issue Jan 24, 2025 · 2 comments
Open

Update the integration of outlines-core in vLLM #145

rlouf opened this issue Jan 24, 2025 · 2 comments
Assignees
Labels

Comments

@rlouf
Copy link
Member

rlouf commented Jan 24, 2025

The integration of outlines-core in vLLM currently performs a lot worse than it could. We can improve the performance in two ways:

  1. Bypass outlines and use the latest version of outlines-core which shows better compilation performance.
  2. The logits masking is currently very inefficient; it allocates new memory for the mask and allowed tokens at every step.

If those don't bring the speed on par with xgrammar we need to understand exactly why that is.

@rlouf rlouf self-assigned this Jan 24, 2025
@rlouf rlouf added the vLLM label Jan 24, 2025
@rlouf rlouf changed the title Update the integration in vLLM Update the integration of outlines-core in vLLM Jan 24, 2025
@rlouf
Copy link
Member Author

rlouf commented Jan 28, 2025

I profiled the logits processing code, and the bottleneck is the transfer of the allowed token ids list (which can have many elements) to GPU. My suggestion is to use a compressed version of the list that can be efficiently uncompressed/used to mask logits on GPU, for instance bitmaps.

We should first evaluate the potential speed-ups in Python; if the bottleneck becomes the bitmap construction we could move it to Rust, if it is the operations on GPU we can implement a CUDA kernel to mask the logits.

@yvan-sraka
Copy link
Contributor

Is there a corresponding issue in the VLLM repository? Also, as mentioned here, tracking performance would really help with reasoning through these kinds of issues, wouldn’t it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants