Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the vocab.json has no Chinese word #9

Open
lsyss-lsyss opened this issue Jan 29, 2025 · 1 comment
Open

the vocab.json has no Chinese word #9

lsyss-lsyss opened this issue Jan 29, 2025 · 1 comment

Comments

@lsyss-lsyss
Copy link

只能输出英语, 不会讲中文, 我看了下vocab, 没汉字啊, 反而好多千奇百怪的符号, 这是用什么的词表

@CyrilSterling
Copy link
Collaborator

模型是可以输出中文的,您可以在我们的demo中尝试(VideoLLaMA3-ImageVideoLLaMA3)若您部署出来的模型无法输出中文,应该是您的部署或者解码存在问题。
词表使用的是Qwen2.5的词表,由于该词表采用BPE的方式编码,所以直接查看vocab里是没有汉字的,但是是可以解码出汉字的:https://github.com/QwenLM/Qwen/blob/main/tokenization_note.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants