Commit
·
97ca638
1
Parent(s):
5751faf
Update README.md
Browse files
README.md
CHANGED
|
@@ -101,6 +101,18 @@ Hard ACC:54.71
|
|
| 101 |
|
| 102 |
Win rate **88.26%** on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
## Other languages
|
| 105 |
We are currently unable to produce accurate benchmark templates for non-QA tasks (languages other than English and Chinese). However, we will be working on other language versions of the QA-Task challenge in the near future.
|
| 106 |
### Japanese Benchmark
|
|
|
|
| 101 |
|
| 102 |
Win rate **88.26%** on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
|
| 103 |
|
| 104 |
+
## MT-Behch on DPO Version
|
| 105 |
+
| Model | MT-Bench |
|
| 106 |
+
| ------------------------- | ------------ |
|
| 107 |
+
| GPT-4 | 8.99 |
|
| 108 |
+
| GPT-3.5-Turbo | 7.94 |
|
| 109 |
+
| | |
|
| 110 |
+
| Zephyr-7b-β (Overfitting) | 7.34 |
|
| 111 |
+
| Zephyr-7b-α | 6.88 |
|
| 112 |
+
| | |
|
| 113 |
+
| **[CausalLM/14B-DPO-α](https://huggingface.co/CausalLM/14B-DPO-alpha)** | **7.618868** |
|
| 114 |
+
| **[CausalLM/7B-DPO-α](https://huggingface.co/CausalLM/7B-DPO-alpha)** | **7.038125** |
|
| 115 |
+
|
| 116 |
## Other languages
|
| 117 |
We are currently unable to produce accurate benchmark templates for non-QA tasks (languages other than English and Chinese). However, we will be working on other language versions of the QA-Task challenge in the near future.
|
| 118 |
### Japanese Benchmark
|