cgoodmaker Claude Opus 4.6 commited on
Commit
0989643
·
1 Parent(s): da343a7

Use bfloat16 on CPU to halve memory (8GB vs 16GB float32)

Browse files

Free-tier HF Spaces have 16GB RAM. MedGemma 4B in float32 consumed
~16GB for weights alone, causing swap thrashing. bfloat16 on CPU is
supported by modern PyTorch and halves memory to ~8GB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. models/medgemma_agent.py +1 -1
models/medgemma_agent.py CHANGED
@@ -255,7 +255,7 @@ class MedGemmaAgent:
255
  self._print("Using CPU")
256
 
257
  model_kwargs = dict(
258
- dtype=torch.bfloat16 if device != "cpu" else torch.float32,
259
  device_map="auto",
260
  )
261
 
 
255
  self._print("Using CPU")
256
 
257
  model_kwargs = dict(
258
+ dtype=torch.bfloat16, # bfloat16 on all devices (halves RAM on CPU: ~8GB vs ~16GB)
259
  device_map="auto",
260
  )
261