Update README.md
Browse files
README.md
CHANGED
|
@@ -159,7 +159,7 @@ The model was trained on A100.
|
|
| 159 |
- **Precision:** 4-bit quantization (NF4) with double quantization, compute in bfloat16
|
| 160 |
- **Optimizer:** `paged_adamw_8bit`
|
| 161 |
- **Scheduler:** Cosine learning rate decay with 3% warmup
|
| 162 |
-
- **Batching:** Effective batch size of
|
| 163 |
- **Epochs:** 1–2 (best checkpoint after 1 epoch, ~1600 steps)
|
| 164 |
- **Dropout:** 0.05 (LoRA)
|
| 165 |
- **LoRA rank:** 16 (`r=16`), scaling factor `alpha=64`
|
|
|
|
| 159 |
- **Precision:** 4-bit quantization (NF4) with double quantization, compute in bfloat16
|
| 160 |
- **Optimizer:** `paged_adamw_8bit`
|
| 161 |
- **Scheduler:** Cosine learning rate decay with 3% warmup
|
| 162 |
+
- **Batching:** Effective batch size of 24 (per_device_train_batch_size=6, gradient_accumulation_steps=4)
|
| 163 |
- **Epochs:** 1–2 (best checkpoint after 1 epoch, ~1600 steps)
|
| 164 |
- **Dropout:** 0.05 (LoRA)
|
| 165 |
- **LoRA rank:** 16 (`r=16`), scaling factor `alpha=64`
|