runtime error

Exit code: 1. Reason: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 247k/247k [00:00<00:00, 68.2MB/s] Downloading shards: 0%| | 0/1 [00:00<?, ?it/s] model-00001-of-000001.safetensors: 0%| | 0.00/6.67G [00:00<?, ?B/s] model-00001-of-000001.safetensors: 1%| | 57.9M/6.67G [00:15<29:00, 3.80MB/s] model-00001-of-000001.safetensors: 3%|β–Ž | 212M/6.67G [00:19<08:13, 13.1MB/s]  model-00001-of-000001.safetensors: 29%|β–ˆβ–ˆβ–Š | 1.90G/6.67G [00:20<00:29, 160MB/s] model-00001-of-000001.safetensors: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 3.59G/6.67G [00:21<00:09, 330MB/s] model-00001-of-000001.safetensors: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 5.29G/6.67G [00:22<00:02, 509MB/s] model-00001-of-000001.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.67G/6.67G [00:23<00:00, 284MB/s] Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:23<00:00, 23.58s/it] Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:23<00:00, 23.58s/it] Traceback (most recent call last): File "/app/app.py", line 20, in <module> model = AutoModel.from_pretrained(MODEL_NAME, _attn_implementation='flash_attention_2', torch_dtype=torch.bfloat16, trust_remote_code=True, use_safetensors=True) File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4091, in from_pretrained config = cls._autoset_attn_implementation( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1617, in _autoset_attn_implementation cls._check_and_enable_flash_attn_2( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1756, in _check_and_enable_flash_attn_2 raise ValueError( ValueError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: Flash Attention 2 is not available on CPU. Please make sure torch can access a CUDA device.

Container logs:

Fetching error logs...