agentlans
/

SmolLM2-135M-Instruct-Plus

Text Generation

instruction-following

text-generation-inference

Model card Files Files and versions

agentlans commited on Mar 18, 2025

Commit

6f93fba

·

verified ·

1 Parent(s): a511b35

Update README.md

Files changed (1) hide show

README.md +2 -18

README.md CHANGED Viewed

@@ -43,25 +43,9 @@ For research, experimentation, and educational purposes where a small instructio
 ## Training Details
-### Supervised Fine-Tuning (SFT)
-- **Key Configurations:**
-  - `batch_size`: 2
-  - `compute_type`: bf16
-  - `learning_rate`: 5e-5
-  - `lora_alpha`: 32
-  - `lora_dropout`: 0
-  - `lr_scheduler_type`: cosine
-### Direct Preference Optimization (DPO)
-- **Key Configurations:**
-  - `batch_size`: 2
-  - `compute_type`: bf16
-  - `learning_rate`: 5e-5
-  - `lora_alpha`: 32
-  - `lora_dropout`: 0.95
-  - `lr_scheduler_type`: cosine
 ## Evaluation

 ## Training Details
+Both SFT and DPO share common settings: liger_kernel booster, LoRA fine-tuning, custom model, BF16 compute type, batch size of 2, and a cosine scheduler with a learning rate of 5e-5. RSLoRA is enabled with a rank of 16 and alpha of 32.
+The main differences are in the dataset and training specifics. SFT uses CrashCourse_120K with packing enabled and LoRA dropout of 0, while DPO uses orca_pairs with packing disabled and a LoRA dropout of 0.95.
 ## Evaluation