Spaces:

ricklon
/

DeepSeek-OCR-2-Math

Running on Zero

ricklon Claude Sonnet 4.6 commited on 10 days ago

Commit

19cbeef

1 Parent(s): b9d5e1c

Fix attn_impl fallback: use eager not sdpa on ZeroGPU

DeepseekOCR2ForCausalLM only supports flash_attention_2 and eager;
sdpa raises ValueError. Fall back to eager when CUDA is unavailable
at module load time (ZeroGPU cold start).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show

app.py +4 -4

app.py CHANGED Viewed

@@ -29,10 +29,10 @@ MODEL_NAME = 'deepseek-ai/DeepSeek-OCR-2'
 tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
 # flash_attention_2 requires a CUDA device at init time — not available on ZeroGPU at
-# module load. Use sdpa (PyTorch scaled dot product attention) as the fallback; it works
-# on CPU at load time and on GPU at inference time. Locally with CUDA present, use
-# flash_attention_2 for maximum throughput.
-_attn_impl = 'flash_attention_2' if torch.cuda.is_available() else 'sdpa'
 model = AutoModel.from_pretrained(MODEL_NAME, _attn_implementation=_attn_impl, torch_dtype=torch.bfloat16, trust_remote_code=True, use_safetensors=True).eval()
 # .cuda() is NOT called here — on ZeroGPU, GPU is only available inside @spaces.GPU
 # functions. Locally, model.cuda() is called inside process_image on first run.

 tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
 # flash_attention_2 requires a CUDA device at init time — not available on ZeroGPU at
+# module load. DeepseekOCR2 only supports 'flash_attention_2' and 'eager'; sdpa is not
+# implemented for this model class. Fall back to 'eager' when no GPU is present.
+# Locally with CUDA, flash_attention_2 is used for maximum throughput.
+_attn_impl = 'flash_attention_2' if torch.cuda.is_available() else 'eager'
 model = AutoModel.from_pretrained(MODEL_NAME, _attn_implementation=_attn_impl, torch_dtype=torch.bfloat16, trust_remote_code=True, use_safetensors=True).eval()
 # .cuda() is NOT called here — on ZeroGPU, GPU is only available inside @spaces.GPU
 # functions. Locally, model.cuda() is called inside process_image on first run.