You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attemping to load a new model after the first when using HF 4bit results in a CUDA error:
ERROR | modeling.inference_models.hf_torch:_get_model:402 - Lazyloader failed, falling back to stock HF load. You may run out of RAM here.
ERROR | modeling.inference_models.hf_torch:_get_model:403 - CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR | modeling.inference_models.hf_torch:_get_model:404 - Traceback (most recent call last):
File "/home/***/AI/KoboldAI/modeling/inference_models/hf_torch.py", line 392, in _get_model
model = AutoModelForCausalLM.from_pretrained(
File "/home/***/AI/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/hf_bleeding_edge/__init__.py", line 59, in from_pretrained
return AM.from_pretrained(path, *args, **kwargs)
File "/home/***/AI/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/home/***/AI/KoboldAI/modeling/patches.py", line 92, in new_from_pretrained
return old_from_pretrained(
File "/home/***/AI/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/home/***/AI/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3260, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/***/AI/KoboldAI/modeling/patches.py", line 302, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(
File "/home/***/AI/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/utils/bitsandbytes.py", line 109, in set_module_quantized_tensor_to_device
new_value = value.to(device)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
If I try launching with CUDA_LAUNCH_BLOCKING=1, it just gets stuck loading the second model (no error) and never finishes.
The text was updated successfully, but these errors were encountered:
This is currently a known issue we are still trying to solve, restarting KoboldAI is currently the best method until we figure out a solution for this.
Attemping to load a new model after the first when using HF 4bit results in a CUDA error:
If I try launching with
CUDA_LAUNCH_BLOCKING=1
, it just gets stuck loading the second model (no error) and never finishes.The text was updated successfully, but these errors were encountered: