runtime error
Exit code: 1. Reason: 0<?, ?B/s][A model-00001-of-000001.safetensors: 1%| | 76.9M/6.67G [00:10<15:10, 7.24MB/s][A model-00001-of-000001.safetensors: 2%|▏ | 135M/6.67G [00:14<11:17, 9.66MB/s] [A model-00001-of-000001.safetensors: 3%|▎ | 212M/6.67G [00:17<07:43, 13.9MB/s][A model-00001-of-000001.safetensors: 10%|█ | 673M/6.67G [00:18<01:33, 64.3MB/s][A model-00001-of-000001.safetensors: 41%|████ | 2.75G/6.67G [00:19<00:11, 342MB/s][A model-00001-of-000001.safetensors: 56%|█████▌ | 3.75G/6.67G [00:21<00:06, 439MB/s][A model-00001-of-000001.safetensors: 93%|█████████▎| 6.21G/6.67G [00:22<00:00, 810MB/s][A model-00001-of-000001.safetensors: 100%|██████████| 6.67G/6.67G [00:24<00:00, 273MB/s] Downloading shards: 100%|██████████| 1/1 [00:24<00:00, 24.42s/it][A Downloading shards: 100%|██████████| 1/1 [00:24<00:00, 24.42s/it] Traceback (most recent call last): File "/app/app.py", line 23, in <module> model = AutoModel.from_pretrained(MODEL_NAME, _attn_implementation='flash_attention_2', torch_dtype=torch.bfloat16, trust_remote_code=True, use_safetensors=True) File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4091, in from_pretrained config = cls._autoset_attn_implementation( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1617, in _autoset_attn_implementation cls._check_and_enable_flash_attn_2( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1756, in _check_and_enable_flash_attn_2 raise ValueError( ValueError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: Flash Attention 2 is not available on CPU. Please make sure torch can access a CUDA device.
Container logs:
Fetching error logs...