Open-Orca
/

oo-phi-1_5

Text Generation

mixformer-sequential

Model card Files Files and versions

bleysg commited on Sep 19, 2023

Commit

cdff12f

·

1 Parent(s): 4d93865

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ This model doesn't dramatically improve on the base model's general task perform
 # Evaluations
-We've only done very limited testing as yet. The [epoch 4.5 checkpoint](https://huggingface.co/Open-Orca/oo-phi-1_5/commit/aa05eb2596d6d11951695d2e327616188d768880) scores above 5 on MT-Bench (better than Alpaca-13B, worse than Llama2-7b-chat), while preliminary benchmarks suggest peak average performance was achieved roughly at epoch 4.
 ## HuggingFaceH4 Open LLM Leaderboard Performance
@@ -29,6 +29,9 @@ The only significant improvement was with TruthfulQA.
 ## MT-bench Performance
 | Epoch     | Average   | Turn 1    | Turn 2    |
 |:----------|:----------|:----------|:----------|
 | 3         | 4.85      | 5.69      | 4.01      |

 # Evaluations
+We've only done limited testing as yet. The [epoch 3.5 checkpoint](https://huggingface.co/Open-Orca/oo-phi-1_5/commit/f7754d8b8b4c3e0748eaf47be4cf5aac1f80a401) scores above 5.1 on MT-Bench (better than Alpaca-13B, worse than Llama2-7b-chat), while preliminary benchmarks suggest peak average performance was achieved roughly at epoch 4.
 ## HuggingFaceH4 Open LLM Leaderboard Performance
 ## MT-bench Performance
+![MT-bench Score](https://huggingface.co/Open-Orca/oo-phi-1_5/resolve/main/Images/oo-phi-1_5-mtbench.png)
 | Epoch     | Average   | Turn 1    | Turn 2    |
 |:----------|:----------|:----------|:----------|
 | 3         | 4.85      | 5.69      | 4.01      |