J Li commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -15,10 +15,13 @@ license: apache-2.0
|
|
| 15 |
This repository provides a fine-tuned version of Pythia-2.8B, using our proposed [SamPO](https://github.com/LuJunru/SamPO) algorithm: Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence.
|
| 16 |
|
| 17 |
## Performance
|
| 18 |
-
|
| 19 |
-
|
|
| 20 |
-
|
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## Evaluation Details
|
| 24 |
We test our model with the same GPT-4 Win rate prompt template proposed by the [DPO paper](https://arxiv.org/pdf/2305.18290). The [sampled test set](https://huggingface.co/robinlee99/Pythia-2.8B-TLDR-Iterative-SamPO/blob/main/test_tldr.jsonl) is included in this repo.
|
|
|
|
| 15 |
This repository provides a fine-tuned version of Pythia-2.8B, using our proposed [SamPO](https://github.com/LuJunru/SamPO) algorithm: Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence.
|
| 16 |
|
| 17 |
## Performance
|
| 18 |
+
| vs. SFT | wins | len / token |
|
| 19 |
+
| ------ | ------ | ------ |
|
| 20 |
+
| DPO | 60.98 | 53.8 |
|
| 21 |
+
| Iterative DPO | **73.58** | 66.65 |
|
| 22 |
+
| Length Normed DPO | 58.13 | 47.34 |
|
| 23 |
+
| SimPO | 33.33 | **31.9** |
|
| 24 |
+
| Iterative SamPO | **73.58** | 49.54 |
|
| 25 |
|
| 26 |
## Evaluation Details
|
| 27 |
We test our model with the same GPT-4 Win rate prompt template proposed by the [DPO paper](https://arxiv.org/pdf/2305.18290). The [sampled test set](https://huggingface.co/robinlee99/Pythia-2.8B-TLDR-Iterative-SamPO/blob/main/test_tldr.jsonl) is included in this repo.
|