CausalLM
/

14B

Text Generation

text-generation-inference

Model card Files Files and versions

JosephusCheung commited on Nov 5, 2023

Commit

97ca638

·

1 Parent(s): 5751faf

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -101,6 +101,18 @@ Hard ACC:54.71
 Win rate **88.26%**	on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
 ## Other languages
 We are currently unable to produce accurate benchmark templates for non-QA tasks (languages other than English and Chinese). However, we will be working on other language versions of the QA-Task challenge in the near future.
 ### Japanese Benchmark

 Win rate **88.26%**	on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) [view raw](https://github.com/tatsu-lab/alpaca_eval/blob/3a47dcd81c56f6a8e6a5711f2754013919fbe90a/results/causallm-14b/model_outputs.json)
+## MT-Behch on DPO Version
+| Model                     | MT-Bench     |
+| ------------------------- | ------------ |
+| GPT-4                     | 8.99         |
+| GPT-3.5-Turbo             | 7.94         |
+|                           |              |
+| Zephyr-7b-β (Overfitting) | 7.34         |
+| Zephyr-7b-α               | 6.88         |
+|                           |              |
+| **[CausalLM/14B-DPO-α](https://huggingface.co/CausalLM/14B-DPO-alpha)**    | **7.618868** |
+| **[CausalLM/7B-DPO-α](https://huggingface.co/CausalLM/7B-DPO-alpha)**     | **7.038125** |
 ## Other languages
 We are currently unable to produce accurate benchmark templates for non-QA tasks (languages other than English and Chinese). However, we will be working on other language versions of the QA-Task challenge in the near future.
 ### Japanese Benchmark