runtime error

Exit code: 1. Reason: s: 71%|███████▏ | 5.18G/7.26G [00:12<00:04, 519MB/s] model.safetensors: 80%|████████ | 5.84G/7.26G [00:13<00:02, 556MB/s] model.safetensors: 89%|████████▉ | 6.46G/7.26G [00:14<00:01, 571MB/s] model.safetensors: 100%|█████████▉| 7.26G/7.26G [00:15<00:00, 477MB/s] INFO - The layer lm_head is not quantized. WARNING - Exllamav2 kernel is not installed, reset disable_exllamav2 to True. This may because you installed auto_gptq using a pre-build wheel on Windows, in which exllama_kernels are not compiled. To use exllama_kernels to further speedup inference, you can re-install auto_gptq from source. WARNING - CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because: 1. You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source. 2. You are using pytorch without CUDA support. 3. CUDA and nvcc are not installed in your device. INFO - The layer lm_head is not quantized. Traceback (most recent call last): File "/home/user/app/app.py", line 19, in <module> model = AutoGPTQForCausalLM.from_quantized( File "/usr/local/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 135, in from_quantized return quant_func( File "/usr/local/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 1246, in from_quantized accelerate.utils.modeling.load_checkpoint_in_model( File "/usr/local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 2015, in load_checkpoint_in_model set_module_tensor_to_device( File "/usr/local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 309, in set_module_tensor_to_device and torch.device(device).type in ("cuda", "xpu") TypeError: device() received an invalid combination of arguments - got (NoneType), but expected one of: * (torch.device device) didn't match because some of the arguments have invalid types: (!NoneType!) * (str type, int index = -1)

Container logs:

Fetching error logs...