runtime error

Exit code: 1. Reason: ensors: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 2.21G/6.67G [00:28<00:21, 204MB/s] model-00001-of-000001.safetensors: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 2.75G/6.67G [00:36<00:31, 125MB/s] model-00001-of-000001.safetensors: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3.67G/6.67G [00:37<00:14, 206MB/s] model-00001-of-000001.safetensors: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 4.52G/6.67G [00:38<00:07, 283MB/s] model-00001-of-000001.safetensors: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 5.13G/6.67G [00:40<00:05, 273MB/s] model-00001-of-000001.safetensors: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 5.67G/6.67G [00:43<00:03, 262MB/s] model-00001-of-000001.safetensors: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 6.13G/6.67G [00:45<00:02, 248MB/s] model-00001-of-000001.safetensors: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 6.52G/6.67G [00:47<00:00, 242MB/s] model-00001-of-000001.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.67G/6.67G [00:47<00:00, 139MB/s] Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:48<00:00, 48.11s/it] Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:48<00:00, 48.11s/it] Traceback (most recent call last): File "/home/user/app/app.py", line 15, in <module> model = AutoModel.from_pretrained( File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4091, in from_pretrained config = cls._autoset_attn_implementation( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1617, in _autoset_attn_implementation cls._check_and_enable_flash_attn_2( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1756, in _check_and_enable_flash_attn_2 raise ValueError( ValueError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: Flash Attention 2 is not available on CPU. Please make sure torch can access a CUDA device.

Container logs:

Fetching error logs...