SUS-Chat-72B / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
e7bce78 verified
|
raw
history blame
8 kB
---
license: other
model-index:
- name: SUS-Chat-72B
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 66.3
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SUSTech/SUS-Chat-72B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 84.96
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SUSTech/SUS-Chat-72B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 76.7
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SUSTech/SUS-Chat-72B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 60.27
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SUSTech/SUS-Chat-72B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 83.43
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SUSTech/SUS-Chat-72B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 9.4
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=SUSTech/SUS-Chat-72B
name: Open LLM Leaderboard
---
---
---
# 🐷SUS-Chat: Instruction tuning done right
<Warning>
not currently in use
</Warning>
<p align="left">
中文</a>&nbsp | &nbsp<a href="README.md">English</a>&nbsp
</p>
<br><br>
<div align="center">
<p align="center">
<img src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/sustech.svg?sanitize=true" width="200px">
<img src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/ccnl.png?sanitize=true" width="200px">
</p>
<div style="display: inline-block;">
<a rel="noopener nofollow" href="https://github.com/SUSTech-IDEA/SUS-Chat/issues">
<img src="https://img.shields.io/github/issues/SUSTech-IDEA/SUS-Chat?logo=github" style="margin: 0 0;">
</a>
</div>
<div style="display: inline-block;">
<a href="https://huggingface.co/SUSTech">
<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SUSTech-blue" style="margin: 0 0;">
</a>
</div>
<div style="display: inline-block;">
<a rel="noopener nofollow" href="https://www.modelscope.cn/organization/sustc/">
<img src="https://img.shields.io/badge/🤖ModelScope-sustc-blue" style="margin: 0 0;">
</a>
</div>
<a href="https://wisemodel.cn/organization/SUSTech">
<img src="https://img.shields.io/badge/WiseModel-SUSTech-blue"> </a>
<div style="display: inline-block;">
<a rel="noopener nofollow" href="https://github.com/SUSTech-IDEA/SUS-Chat/blob/main/LICENSE">
<img src="https://img.shields.io/badge/Code_License-Apache_2.0-lightblue" style="margin: 0 0;">
</a>
</div>
<div style="display: inline-block;">
<a rel="noopener nofollow" href="https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt">
<img src="https://img.shields.io/badge/Model_License-Model_Agreement-lightblue" style="margin: 0 0;">
</a>
</div>
<div style="display: inline-block;">
<a rel="noopener nofollow" href="mailto:oss@data.sustech.edu.cn">
<img src="https://img.shields.io/badge/✉️-data@sustech.edu.cn-FFE01B" style="margin: 0 0;">
</a>
</div>
</div>
# 🐷SUS-Chat: Instruction tuning done right
<p align="left">
<a href="README_CN.md">中文</a>&nbsp | &nbspEnglish&nbsp
</p>
<br><br>
<div align="center">
<p align="center">
<img src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/sustech.svg?sanitize=true" width="200px">
<img src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/ccnl.png?sanitize=true" width="200px">
</p>
<div style="display: inline-block;">
<a rel="noopener nofollow" href="https://github.com/SUSTech-IDEA/SUS-Chat/issues">
<img src="https://img.shields.io/github/issues/SUSTech-IDEA/SUS-Chat?logo=github" style="margin: 0 0;">
</a>
</div>
<div style="display: inline-block;">
<a href="https://huggingface.co/SUSTech">
<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SUSTech-blue" style="margin: 0 0;">
</a>
</div>
<div style="display: inline-block;">
<a rel="noopener nofollow" href="https://www.modelscope.cn/organization/sustc/">
<img src="https://img.shields.io/badge/🤖ModelScope-sustc-blue" style="margin: 0 0;">
</a>
</div>
<a href="https://wisemodel.cn/organization/SUSTech">
<img src="https://img.shields.io/badge/WiseModel-SUSTech-blue"> </a>
<div style="display: inline-block;">
<a rel="noopener nofollow" href="https://github.com/SUSTech-IDEA/SUS-Chat/blob/main/LICENSE">
<img src="https://img.shields.io/badge/Code_License-Apache_2.0-lightblue" style="margin: 0 0;">
</a>
</div>
<div style="display: inline-block;">
<a rel="noopener nofollow" href="https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt">
<img src="https://img.shields.io/badge/Model_License-Model_Agreement-lightblue" style="margin: 0 0;">
</a>
</div>
<div style="display: inline-block;">
<a rel="noopener nofollow" href="mailto:oss@data.sustech.edu.cn">
<img src="https://img.shields.io/badge/✉️-data@sustech.edu.cn-FFE01B" style="margin: 0 0;">
</a>
</div>
</div>
# News
- 2023-12-09: 🔥 `Tigerbot` variant has been
[deleted](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/438),
`SUS-Chat-34B` is now the the top-ranked LLaMA model and the
top-ranked chat model.
- 2023-12-07: SUS-Chat-34B is now available on
[WiseModel🧠](https://wisemodel.cn/model/SUSTech/SUS-Chat-34B).
- 2023-12-06: Try [SUS-Chat-34B
chat-ui](https://huggingface.co/spaces/SUSTech/SUS-Chat-34B).
- 2023-12-05: SUS-Chat-34B is now available on
[ModelScope🤖](https://www.modelscope.cn/models/SUSTC/SUS-Chat-34B/summary)
- 2023-12-05: SUS-Chat-34B is ranked 2nd in [Open LLM
leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
and surpassed all models under 70B.
- 2023-12-01: SUS-Chat-34B is now available on
[HuggingFace🤗](https://huggingface.co/SUSTech/SUS-Chat-34B).
# Introduction
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_SUSTech__SUS-Chat-72B)
| Metric |Value|
|---------------------------------|----:|
|Avg. |63.51|
|AI2 Reasoning Challenge (25-Shot)|66.30|
|HellaSwag (10-Shot) |84.96|
|MMLU (5-Shot) |76.70|
|TruthfulQA (0-shot) |60.27|
|Winogrande (5-shot) |83.43|
|GSM8k (5-shot) | 9.40|