Update the integration of `outlines-core` in vLLM #145

rlouf · 2025-01-24T20:54:35Z

The integration of outlines-core in vLLM currently performs a lot worse than it could. We can improve the performance in two ways:

Bypass outlines and use the latest version of outlines-core which shows better compilation performance.
The logits masking is currently very inefficient; it allocates new memory for the mask and allowed tokens at every step.

If those don't bring the speed on par with xgrammar we need to understand exactly why that is.

The text was updated successfully, but these errors were encountered:

rlouf · 2025-01-28T09:09:13Z

I profiled the logits processing code, and the bottleneck is the transfer of the allowed token ids list (which can have many elements) to GPU. My suggestion is to use a compressed version of the list that can be efficiently uncompressed/used to mask logits on GPU, for instance bitmaps.

We should first evaluate the potential speed-ups in Python; if the bottleneck becomes the bitmap construction we could move it to Rust, if it is the operations on GPU we can implement a CUDA kernel to mask the logits.

yvan-sraka · 2025-01-28T18:43:11Z

Is there a corresponding issue in the VLLM repository? Also, as mentioned here, tracking performance would really help with reasoning through these kinds of issues, wouldn’t it?

rlouf self-assigned this Jan 24, 2025

rlouf added the vLLM label Jan 24, 2025

rlouf changed the title ~~Update the integration in vLLM~~ Update the integration of outlines-core in vLLM Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the integration of `outlines-core` in vLLM #145

Update the integration of `outlines-core` in vLLM #145

rlouf commented Jan 24, 2025

rlouf commented Jan 28, 2025 •

edited

Loading

yvan-sraka commented Jan 28, 2025

Update the integration of outlines-core in vLLM #145

Update the integration of outlines-core in vLLM #145

Comments

rlouf commented Jan 24, 2025

rlouf commented Jan 28, 2025 • edited Loading

yvan-sraka commented Jan 28, 2025

Update the integration of `outlines-core` in vLLM #145

Update the integration of `outlines-core` in vLLM #145

rlouf commented Jan 28, 2025 •

edited

Loading