-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
awq example runs into error with llama 3.2 3b due to embedding layer #1089
Comments
Hi @baijumeswani - I just want to confirm that I'm specifically running the example for dml. |
The weights for the embedding and language modeling head (LM head) are similar as one is the transpose of the other. Some models that have very large vocabulary sizes tie the embedding and LM head weights together by saving one copy of the weights on disk. When the weights are tied, they can be stored either in the embedding or in the LM head. The below code snippet sets the LM head's attributes from the embedding's attributes if not already set. onnxruntime-genai/src/python/py/models/quantized_model.py Lines 340 to 345 in 17061e0
However, the reverse way to set the embedding's attributes from the LM head's attributes is not added. For LLaMA-3.2, it appears that the To temporarily unblock you, can you add the following in # This is a copy of the above code snippet where references to `embedding` are replaced with `lm_head`
# and references to `lm_head` are replaced with `embedding`
# Set embedding weights + biases if not already set
if isinstance(self.embedding, TensorModule) and self.embedding.weight is None:
# LM head and embedding share same weights + biases (embedding.weight == lm_head.weight and embedding.bias == lm_head.bias)
self.embedding.weight = self.lm_head.weight
if self.embedding.bias is not None:
self.embedding.bias = self.lm_head.bias The logic for handling the bias needs to be re-visited in both cases before merging a fix. In some models, the condition should be |
Describe the bug
When I run the example from examples/python/awq-quantized-model.md, but switching out phi-3 for llama-3.2-3b, I get an error message stating that
AttributeError: 'NoneType' object has no attribute 'detach'
. However, when I use the extra_optionexclude_embeds=true
, the onnx conversion step runs successfully.To Reproduce
Steps to reproduce the behavior:
model_name = "meta-llama/Llama-3.2-3B-Instruct"
Expected behavior
The conversion to onnx should occur successfully, with no errors.
Screenshots
Desktop (please complete the following information):
Additional context
I've manually tried loading the awq quantized model and it looks fine. I can see the embeddings and grab them by attribute as well. Here is the output when I exclude embeddings:
The text was updated successfully, but these errors were encountered: