Hanzo Dev commited on
Commit
3989ff3
·
1 Parent(s): 523229a

Add accurate hardware specs from Unsloth/Moonshot

Browse files

- INT4 model: 370GB (62 shards)
- Minimum: 247GB combined RAM+VRAM+Disk
- GGUF quants: 245GB (1.66bit) to 588GB (4.5bit)
- Training: 4x A100 80GB needed (~500GB VRAM)

Files changed (1) hide show
  1. README.md +17 -11
README.md CHANGED
@@ -242,17 +242,23 @@ tools = {
242
 
243
  ## Hardware Requirements
244
 
245
- ### Inference (INT4)
246
- - **VRAM**: ~30-40 GB (INT4 quantized)
247
- - **RAM**: 64 GB recommended
248
- - **Storage**: ~60 GB for full model + quantizations
249
- - **GPU**: A100 40GB or 2× RTX 4090
250
-
251
- ### Training
252
- - **VRAM**: ~80-160 GB (full precision)
253
- - **RAM**: 256 GB recommended
254
- - **GPUs**: 4- A100 80GB for fine-tuning
255
- - **Storage**: ~120 GB for checkpoints
 
 
 
 
 
 
256
 
257
  ## Format Availability
258
 
 
242
 
243
  ## Hardware Requirements
244
 
245
+ ### Inference (INT4 from HuggingFace)
246
+ - **Model Size**: ~370GB (62 safetensors shards, INT4 quantized)
247
+ - **Minimum**: 247GB combined RAM+VRAM+Disk
248
+ - **Optimal**: 370GB+ RAM+VRAM for 5+ tokens/s
249
+ - **Budget Setup**: 1x 24GB GPU + 256GB RAM (~1-2 tokens/s)
250
+ - **High Performance**: 4x A100 80GB or 8x A100 40GB
251
+
252
+ ### Alternative: GGUF Quantizations (Unsloth)
253
+ - **1.66-bit (UD-TQ1_0)**: 245GB - fits on 247GB combined RAM+VRAM
254
+ - **2.71-bit (UD-Q2_K_XL)**: 381GB - recommended for accuracy
255
+ - **4.5-bit (UD-Q4_K_XL)**: 588GB - near full precision
256
+
257
+ ### QLoRA Training
258
+ - **VRAM**: ~500GB total (370GB model + 130GB activations)
259
+ - **GPUs**: 4x A100 80GB or 8x A100 40GB
260
+ - **Training Time**: 4-8 hours for 1000 steps
261
+ - **Output**: LoRA adapters (~100MB)
262
 
263
  ## Format Availability
264