Update README.md
Browse files
README.md
CHANGED
|
@@ -140,6 +140,21 @@ I ran these evaluations using [SmolLM2's evaluation code](https://github.com/hug
|
|
| 140 |
| MMLU-Pro (MCF) | 17.4 | 17.3 | 19.3 | 12.7 | **24.2** | 11.7 |
|
| 141 |
| PIQA | 72.2 | 72.1 | **74.4** | 72.3 | 73.2 | 71.6 |
|
| 142 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
## Usage
|
| 144 |
|
| 145 |
Just like any Huggingface model, just run it using the transformers library:
|
|
|
|
| 140 |
| MMLU-Pro (MCF) | 17.4 | 17.3 | 19.3 | 12.7 | **24.2** | 11.7 |
|
| 141 |
| PIQA | 72.2 | 72.1 | **74.4** | 72.3 | 73.2 | 71.6 |
|
| 142 |
|
| 143 |
+
## Training Details
|
| 144 |
+
|
| 145 |
+
The model was trained using Direct Preference Optimization (DPO) with the following configuration:
|
| 146 |
+
- Base model: SmolLM2-1.7B with AllenAI's SFT pipeline ran
|
| 147 |
+
- Mixed precision: bfloat16
|
| 148 |
+
- Learning rate: 8e-7 with linear scheduler
|
| 149 |
+
- Warmup ratio: 0.1
|
| 150 |
+
- Training epochs: 1
|
| 151 |
+
- Effective batch size: 12
|
| 152 |
+
- Sequence length: 4096 tokens
|
| 153 |
+
- DPO loss: Length-normalized DPO
|
| 154 |
+
- DPO beta: 5.0
|
| 155 |
+
- Gradient checkpointing enabled
|
| 156 |
+
- DeepSpeed Stage 3 for memory optimization
|
| 157 |
+
|
| 158 |
## Usage
|
| 159 |
|
| 160 |
Just like any Huggingface model, just run it using the transformers library:
|