tatsu-lab/alpaca
Viewer • Updated • 52k • 104k • 984
Here is the instruction-tuned version of the pretrained Kiwi-1.0-0.7B model. As can been seen in the table below, the results are at paar with the SOTA Qwen2.5-0.5B.
The performance evaluation is based on the tasks being evaluated on the Open LLM Leaderboard.
The model is evaluated on three benchmark datasets, which include ARC-Challenge, HellaSwag, MMLU, IFEval and GPQA.
The library used is lm-evaluation-harness repository
| Metric | Qwen2.5-05B-Instruct | Kiwi-1.0-0.7B-32k-Instruct |
|---|---|---|
| ARC | 33.45 | 32.34 |
| HellaSwag | 52.37 | 48.59 |
| MMLU-PRO | 14.03 | 12.89 |
| IFEval | 37.53 | 27.1 |
| GPQA (Diamond) (Zero-shot CoT) |
12.27 | 17.17 |
| Average | 29,93 | 27,27 |