Hanzo Dev
commited on
Commit
·
3989ff3
1
Parent(s):
523229a
Add accurate hardware specs from Unsloth/Moonshot
Browse files- INT4 model: 370GB (62 shards)
- Minimum: 247GB combined RAM+VRAM+Disk
- GGUF quants: 245GB (1.66bit) to 588GB (4.5bit)
- Training: 4x A100 80GB needed (~500GB VRAM)
README.md
CHANGED
|
@@ -242,17 +242,23 @@ tools = {
|
|
| 242 |
|
| 243 |
## Hardware Requirements
|
| 244 |
|
| 245 |
-
### Inference (INT4)
|
| 246 |
-
- **
|
| 247 |
-
- **
|
| 248 |
-
- **
|
| 249 |
-
- **
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
- **
|
| 254 |
-
- **
|
| 255 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 256 |
|
| 257 |
## Format Availability
|
| 258 |
|
|
|
|
| 242 |
|
| 243 |
## Hardware Requirements
|
| 244 |
|
| 245 |
+
### Inference (INT4 from HuggingFace)
|
| 246 |
+
- **Model Size**: ~370GB (62 safetensors shards, INT4 quantized)
|
| 247 |
+
- **Minimum**: 247GB combined RAM+VRAM+Disk
|
| 248 |
+
- **Optimal**: 370GB+ RAM+VRAM for 5+ tokens/s
|
| 249 |
+
- **Budget Setup**: 1x 24GB GPU + 256GB RAM (~1-2 tokens/s)
|
| 250 |
+
- **High Performance**: 4x A100 80GB or 8x A100 40GB
|
| 251 |
+
|
| 252 |
+
### Alternative: GGUF Quantizations (Unsloth)
|
| 253 |
+
- **1.66-bit (UD-TQ1_0)**: 245GB - fits on 247GB combined RAM+VRAM
|
| 254 |
+
- **2.71-bit (UD-Q2_K_XL)**: 381GB - recommended for accuracy
|
| 255 |
+
- **4.5-bit (UD-Q4_K_XL)**: 588GB - near full precision
|
| 256 |
+
|
| 257 |
+
### QLoRA Training
|
| 258 |
+
- **VRAM**: ~500GB total (370GB model + 130GB activations)
|
| 259 |
+
- **GPUs**: 4x A100 80GB or 8x A100 40GB
|
| 260 |
+
- **Training Time**: 4-8 hours for 1000 steps
|
| 261 |
+
- **Output**: LoRA adapters (~100MB)
|
| 262 |
|
| 263 |
## Format Availability
|
| 264 |
|