Inquiry About Latency of streaming generate with audio in MiniCPMo Web Demo #788

tienanh28122000 · 2025-01-23T03:44:42Z

Thank you for your excellent work on MiniCPMo and its integration with ChatTTS! I have been exploring the web demo and am very impressed by its capabilities.

I would like to ask about the latency of ChatTTS in the web demo. I noticed that in the streaming_generate function, setting generate_audio=False results in ~0.16 sec latency per chunk received. However, enabling generate_audio=True increases the latency significantly to ~0.6-1.0 sec. Is there a reason for this slow performance, and are there any optimizations to improve it?

Thank you for your time, and I look forward to your response!

bokesyo · 2025-01-23T08:06:06Z

Hi! Thank you for using MiniCPM-o 2.6, I checked the code on huggingface, if I understand your question correctly, streaming_generate=True means to generate audio, and streaming_generate=False will not generate audio. The difference is because the TTS generate the first audio chunk from the first text chunk.

Maybe this line could helps:
https://huggingface.co/openbmb/MiniCPM-o-2_6/blob/4a25f999c53b8a51b5ef4ccca45c8a5e59d06a7e/modeling_minicpmo.py#L1216

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry About Latency of streaming generate with audio in MiniCPMo Web Demo #788

Inquiry About Latency of streaming generate with audio in MiniCPMo Web Demo #788

tienanh28122000 commented Jan 23, 2025

bokesyo commented Jan 23, 2025

Inquiry About Latency of streaming generate with audio in MiniCPMo Web Demo #788

Inquiry About Latency of streaming generate with audio in MiniCPMo Web Demo #788

Comments

tienanh28122000 commented Jan 23, 2025

bokesyo commented Jan 23, 2025