Commit ·
0989643
1
Parent(s): da343a7
Use bfloat16 on CPU to halve memory (8GB vs 16GB float32)
Browse filesFree-tier HF Spaces have 16GB RAM. MedGemma 4B in float32 consumed
~16GB for weights alone, causing swap thrashing. bfloat16 on CPU is
supported by modern PyTorch and halves memory to ~8GB.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- models/medgemma_agent.py +1 -1
models/medgemma_agent.py
CHANGED
|
@@ -255,7 +255,7 @@ class MedGemmaAgent:
|
|
| 255 |
self._print("Using CPU")
|
| 256 |
|
| 257 |
model_kwargs = dict(
|
| 258 |
-
dtype=torch.bfloat16
|
| 259 |
device_map="auto",
|
| 260 |
)
|
| 261 |
|
|
|
|
| 255 |
self._print("Using CPU")
|
| 256 |
|
| 257 |
model_kwargs = dict(
|
| 258 |
+
dtype=torch.bfloat16, # bfloat16 on all devices (halves RAM on CPU: ~8GB vs ~16GB)
|
| 259 |
device_map="auto",
|
| 260 |
)
|
| 261 |
|