agentlans commited on
Commit
6f93fba
·
verified ·
1 Parent(s): a511b35

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -18
README.md CHANGED
@@ -43,25 +43,9 @@ For research, experimentation, and educational purposes where a small instructio
43
 
44
  ## Training Details
45
 
46
- ### Supervised Fine-Tuning (SFT)
47
 
48
- - **Key Configurations:**
49
- - `batch_size`: 2
50
- - `compute_type`: bf16
51
- - `learning_rate`: 5e-5
52
- - `lora_alpha`: 32
53
- - `lora_dropout`: 0
54
- - `lr_scheduler_type`: cosine
55
-
56
- ### Direct Preference Optimization (DPO)
57
-
58
- - **Key Configurations:**
59
- - `batch_size`: 2
60
- - `compute_type`: bf16
61
- - `learning_rate`: 5e-5
62
- - `lora_alpha`: 32
63
- - `lora_dropout`: 0.95
64
- - `lr_scheduler_type`: cosine
65
 
66
  ## Evaluation
67
 
 
43
 
44
  ## Training Details
45
 
46
+ Both SFT and DPO share common settings: liger_kernel booster, LoRA fine-tuning, custom model, BF16 compute type, batch size of 2, and a cosine scheduler with a learning rate of 5e-5. RSLoRA is enabled with a rank of 16 and alpha of 32.
47
 
48
+ The main differences are in the dataset and training specifics. SFT uses CrashCourse_120K with packing enabled and LoRA dropout of 0, while DPO uses orca_pairs with packing disabled and a LoRA dropout of 0.95.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ## Evaluation
51