| | --- |
| | language: |
| | - en |
| | license: cc-by-nc-sa-4.0 |
| | pipeline_tag: text-generation |
| | model-index: |
| | - name: Merge_Sakura_Solar |
| | results: |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: AI2 Reasoning Challenge (25-Shot) |
| | type: ai2_arc |
| | config: ARC-Challenge |
| | split: test |
| | args: |
| | num_few_shot: 25 |
| | metrics: |
| | - type: acc_norm |
| | value: 70.73 |
| | name: normalized accuracy |
| | source: |
| | url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar |
| | name: Open LLM Leaderboard |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: HellaSwag (10-Shot) |
| | type: hellaswag |
| | split: validation |
| | args: |
| | num_few_shot: 10 |
| | metrics: |
| | - type: acc_norm |
| | value: 88.51 |
| | name: normalized accuracy |
| | source: |
| | url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar |
| | name: Open LLM Leaderboard |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: MMLU (5-Shot) |
| | type: cais/mmlu |
| | config: all |
| | split: test |
| | args: |
| | num_few_shot: 5 |
| | metrics: |
| | - type: acc |
| | value: 66.03 |
| | name: accuracy |
| | source: |
| | url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar |
| | name: Open LLM Leaderboard |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: TruthfulQA (0-shot) |
| | type: truthful_qa |
| | config: multiple_choice |
| | split: validation |
| | args: |
| | num_few_shot: 0 |
| | metrics: |
| | - type: mc2 |
| | value: 72.21 |
| | source: |
| | url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar |
| | name: Open LLM Leaderboard |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: Winogrande (5-shot) |
| | type: winogrande |
| | config: winogrande_xl |
| | split: validation |
| | args: |
| | num_few_shot: 5 |
| | metrics: |
| | - type: acc |
| | value: 82.72 |
| | name: accuracy |
| | source: |
| | url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar |
| | name: Open LLM Leaderboard |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | name: GSM8k (5-shot) |
| | type: gsm8k |
| | config: main |
| | split: test |
| | args: |
| | num_few_shot: 5 |
| | metrics: |
| | - type: acc |
| | value: 63.99 |
| | name: accuracy |
| | source: |
| | url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dddsaty/Merge_Sakura_Solar |
| | name: Open LLM Leaderboard |
| | --- |
| | |
| | **Explanation** |
| | - Merged three models using [mergekit](https://github.com/arcee-ai/mergekit) (dare_ties) |
| | |
| | **Models** |
| | - [Sakura-SOLAR-Instruct](https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct) |
| | - [Sakura-SOLRCA-Math-Instruct-DPO-v2](https://huggingface.co/kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2) |
| | - [Sakura-SOLRCA-Instruct-DPO](https://huggingface.co/kyujinpy/Sakura-SOLRCA-Instruct-DPO) |
| | |
| | **Score** |
| | |Average|ARC|HellaSwag|MMLU|TruthfulQA|Winogrande|GSM8K| |
| | |:---:|:---:|:---:|:---:|:---:|:---:|:---:| |
| | |74.03|70.73|88.51|66.03|72.21|82.72|63.99| |
| | |
| | **Original Author's HuggingFace profile** |
| | - [kyujinpy](https://huggingface.co/kyujinpy) |
| | |
| | **License** |
| | - Following the license written at the author's space |
| | # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
| | Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dddsaty__Merge_Sakura_Solar) |
| | |
| | | Metric |Value| |
| | |---------------------------------|----:| |
| | |Avg. |74.03| |
| | |AI2 Reasoning Challenge (25-Shot)|70.73| |
| | |HellaSwag (10-Shot) |88.51| |
| | |MMLU (5-Shot) |66.03| |
| | |TruthfulQA (0-shot) |72.21| |
| | |Winogrande (5-shot) |82.72| |
| | |GSM8k (5-shot) |63.99| |
| | |
| | |