leaderboard-pr-bot's picture
Adding Evaluation Results
c5d73b2
|
raw
history blame
5.11 kB
---
language:
- ko
library_name: transformers
pipeline_tag: text-generation
license: cc-by-nc-4.0
---
# **V0.3 IS UP**
[Link to V0.3](https://huggingface.co/maywell/Synatra-7B-v0.3-base)
# **Synatra-V0.1-7B**
Made by StableFluffy
[Visit my website! - Currently on consturction..](https://www.stablefluffy.kr/)
## License
This model is strictly [*non-commercial*](https://creativecommons.org/licenses/by-nc/4.0/) (**cc-by-nc-4.0**) use only which takes priority over the **LLAMA 2 COMMUNITY LICENSE AGREEMENT**.
The "Model" is completely free (ie. base model, derivates, merges/mixes) to use for non-commercial purposes as long as the the included **cc-by-nc-4.0** license in any parent repository, and the non-commercial use statute remains, regardless of other models' licences.
The licence can be changed after new model released.
## Model Details
**Base Model**
[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
**Trained On**
A6000 48GB * 8
## Instruction format
**ํ•™์Šต ๊ณผ์ •์˜ ์‹ค์ˆ˜๋กœ [/INST]๊ฐ€ ์•„๋‹Œ [\INST]๊ฐ€ ์ ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. v0.2 ์—์„œ ์ˆ˜์ • ๋  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.**
In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[\INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
Plus, It is strongly recommended to add a space at the end of the prompt.
E.g.
```
text = "<s>[INST] ์•„์ด์ž‘ ๋‰ดํ„ด์˜ ์—…์ ์„ ์•Œ๋ ค์ค˜. [\INST] "
```
# **Model Benchmark**
## KULLM Evaluation
๊ตฌ๋ฆ„v2 repo ์—์„œ ์ œ๊ณต๋˜๋Š” ๋ฐ์ดํ„ฐ์…‹๊ณผ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.
๋‹น์‹œ GPT4์™€ ํ˜„์žฌ GPT4๊ฐ€ ์™„์ „ํžˆ ๋™์ผํ•˜์ง€๋Š” ์•Š๊ธฐ์— ์‹ค์ œ ๊ฒฐ๊ณผ์™€ ์•ฝ๊ฐ„์˜ ์ฐจ์ด๊ฐ€ ์กด์žฌ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
![img](./kullm_eval.png)
| Model | ์ดํ•ด๊ฐ€๋Šฅ์„ฑ | ์ž์—ฐ์Šค๋Ÿฌ์›€ | ๋งฅ๋ฝ์œ ์ง€ | ํฅ๋ฏธ๋กœ์›€ | ์ง€์‹œ์–ด์‚ฌ์šฉ | ์ „๋ฐ˜์ ํ€„๋ฆฌํ‹ฐ
| --- | --- | --- | --- | --- | --- | ---
| GPT-3.5 | 0.980 | 2.806 | 2.849 | 2.056 | 0.917 | 3.905
| GPT-4 | 0.984 | 2.897 | 2.944 | 2.143 | 0.968 | 4.083
| KoAlpaca v1.1 | 0.651 | 1.909 | 1.901 | 1.583 | 0.385 | 2.575
| koVicuna | 0.460 | 1.583 | 1.726 | 1.528 | 0.409 | 2.440
| KULLM v2 | 0.742 | 2.083 | 2.107 | 1.794 | 0.548 | 3.036
| **Synatra-V0.1-7B** | **0.960** | **2.821** | **2.755** | **2.356** | **0.934** | **4.065**
## KOBEST_BOOLQ, SENTINEG, WIC - ZERO_SHOT
[EleutherAI/lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/polyglot)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ BoolQ, SentiNeg, Wic์„ ์ธก์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.
HellaSwag์™€ COPA๋Š” ์›๋ณธ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ๊ณผ์ •์—์„œ ์–ด๋ ค์›€์„ ๊ฒช์–ด ์•„์ง ์ง„ํ–‰ํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
### NOTE
BoolQ์—๋Š” Instruction ๋ชจ๋ธ์˜ ์ดํ•ด๋ฅผ ๋•๊ธฐ์œ„ํ•ด "์œ„ ๊ธ€์— ๋Œ€ํ•œ ์งˆ๋ฌธ์— ์‚ฌ์‹ค์„ ํ™•์ธํ•˜๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.", "์˜ˆ, ์•„๋‹ˆ์˜ค๋กœ ๋Œ€๋‹ตํ•ด์ฃผ์„ธ์š”."์˜ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.
SentiNeg์—๋Š” Instruction ๋ชจ๋ธ์˜ ์ดํ•ด๋ฅผ ๋•๊ธฐ์œ„ํ•ด "์œ„ ๋ฌธ์žฅ์˜ ๊ธ์ •, ๋ถ€์ • ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜์„ธ์š”."์˜ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.
Wic์˜ ๊ฒฝ์šฐ๋Š” [INST], [\INST]๋งŒ ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
| Model | COPA | HellaSwag | BoolQ | SentiNeg | Wic
| --- | --- | --- | --- | --- | ---
| EleutherAI/polyglot-ko-12.8b | 0.7937 | 0.5954 | 0.4818 | 0.9117 | 0.3985
| **Synatra-V0.1-7B** | **NaN** | **NaN** | **0.849** | **0.8690** | **0.4881**
# **Implementation Code**
Since, chat_template already contains insturction format above.
You can use the code below.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("maywell/Synatra-V0.1-7B")
tokenizer = AutoTokenizer.from_pretrained("maywell/Synatra-V0.1-7B")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```
If you run it on oobabooga your prompt would look like this. - ** Need to add Space at the end! **
```
[INST] ๋ง์ปจ์— ๋Œ€ํ•ด์„œ ์•Œ๋ ค์ค˜. [\INST]
```
> Readme format: [beomi/llama-2-ko-7b](https://huggingface.co/beomi/llama-2-ko-7b)
---
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_maywell__Synatra-V0.1-7B)
| Metric | Value |
|-----------------------|---------------------------|
| Avg. | 53.54 |
| ARC (25-shot) | 55.29 |
| HellaSwag (10-shot) | 76.63 |
| MMLU (5-shot) | 55.29 |
| TruthfulQA (0-shot) | 55.76 |
| Winogrande (5-shot) | 72.77 |
| GSM8K (5-shot) | 19.41 |
| DROP (3-shot) | 39.63 |