prernac1 commited on
Commit
3babf25
·
verified ·
1 Parent(s): 21e54db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -159,7 +159,7 @@ The model was trained on A100.
159
  - **Precision:** 4-bit quantization (NF4) with double quantization, compute in bfloat16
160
  - **Optimizer:** `paged_adamw_8bit`
161
  - **Scheduler:** Cosine learning rate decay with 3% warmup
162
- - **Batching:** Effective batch size of 16 (per_device_train_batch_size=6, gradient_accumulation_steps=4)
163
  - **Epochs:** 1–2 (best checkpoint after 1 epoch, ~1600 steps)
164
  - **Dropout:** 0.05 (LoRA)
165
  - **LoRA rank:** 16 (`r=16`), scaling factor `alpha=64`
 
159
  - **Precision:** 4-bit quantization (NF4) with double quantization, compute in bfloat16
160
  - **Optimizer:** `paged_adamw_8bit`
161
  - **Scheduler:** Cosine learning rate decay with 3% warmup
162
+ - **Batching:** Effective batch size of 24 (per_device_train_batch_size=6, gradient_accumulation_steps=4)
163
  - **Epochs:** 1–2 (best checkpoint after 1 epoch, ~1600 steps)
164
  - **Dropout:** 0.05 (LoRA)
165
  - **LoRA rank:** 16 (`r=16`), scaling factor `alpha=64`