ColaCoca's picture
jeju ๋ชจ๋ธ ์—…๋กœ๋“œ
8eb2cb0
# komt : Korean Multi-task Instruction Tuning
![multi task instruction tuning.jpg](images%2Fmulti%20task%20instruction%20tuning.jpg)
Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities.
However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively.
This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).
## News or Update
### 2023.12.05
- dpo train ์ฝ”๋“œ ๊ณต๊ฐœ [dpo_train.py](dpo_train.py)
### 2023.11.29
- komt-mistral-7b-v1-dpo : dpo(Direct Preference Optimization) ํ•™์Šต ๋ชจ๋ธ ์ถ”๊ฐ€
> - [davidkim205/komt-mistral-7b-v1-dpo](https://huggingface.co/davidkim205/komt-mistral-7b-v1-dpo/blob/main/README.md)
- komt-mistral-7b-v1-dpo ํ‰๊ฐ€๊ฒฐ๊ณผ ํ˜„์žฌ komt๋ชจ๋ธ ์ค‘์—์„œ ๊ฐ€์žฅ๋†’์€ ์„ฑ๋Šฅ์ธ 76.75%๊ธฐ๋ก.. (gpt-3.5-turbo 79.45%)
### 2023.10.24
- komt-mistral-7b-v1 ๋ชจ๋ธ ์ถ”๊ฐ€
> - [davidkim205/komt-mistral-7b-v1](https://huggingface.co/davidkim205/komt-mistral-7b-v1)
> - [davidkim205/komt-mistral-7b-v1-lora](https://huggingface.co/davidkim205/komt-mistral-7b-v1-lora)
> - [davidkim205/komt-mistral-7b-v1-gguf](https://huggingface.co/davidkim205/komt-mistral-7b-v1-gguf)
### 2023.10.20
- komt-llama-30b-v1 ๋ชจ๋ธ ์ถ”๊ฐ€
> - [davidkim205/komt-llama-30b-v1](https://huggingface.co/davidkim205/komt-llama-30b-v1)
> - [davidkim205/komt-llama-30b-v1-lora](https://huggingface.co/davidkim205/komt-llama-30b-v1-lora)
### 2023.09.27
- chatgpt ๊ธฐ๋ฐ˜ ํ‰๊ฐ€ ๊ฒฐ๊ณผ์— ์•„๋ž˜ ๋ชจ๋ธ ์ถ”๊ฐ€
> - naver Cue
> - clova X
> - nlpai-lab/kullm-polyglot-12.8b-v2
> - kfkas/Llama-2-ko-7b-Chat
> - beomi/KoAlpaca-Polyglot-12.8B
### 2023.09.25
- komt-llama2-13b-v1 ๋ชจ๋ธ ์ถ”๊ฐ€
> - [davidkim205/komt-llama2-13b-v1](https://huggingface.co/davidkim205/komt-llama2-13b-v1)
> - [davidkim205/komt-llama2-13b-v1-lora](https://huggingface.co/davidkim205/komt-llama2-13b-v1-lora)
> - [davidkim205/komt-llama2-13b-v1-ggml](https://huggingface.co/davidkim205/komt-llama2-13b-v1-ggml)
### 2023.09.24
- Fine-tune with deepspeed ํ•™์Šต ๋ฐฉ๋ฒ• ์ถ”๊ฐ€
### 2023.09.23
- usage komt with vllm ์ฝ”๋“œ์™€ ์„ค์น˜ ๋ฐฉ๋ฒ• ์ถ”๊ฐ€
### 2023.09.22
- ๋ชจ๋ธ ํ‰๊ฐ€ ๊ฒฐ๊ณผํ‘œ ์ถ”๊ฐ€
### 2023.09.20
- finetune_with_lora ํ•™์Šต์‹œ 4bit, 8bit ์„ ํƒํ•˜์—ฌ ํ•™์Šตํ• ์ˆ˜ ์žˆ๋„๋ก ๊ธฐ๋Šฅ์ถ”๊ฐ€
### 2023.09.19
- komt-llama2 ๋ชจ๋ธ์„ ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ• ์ˆ˜ ์žˆ๋„๋ก ์˜ˆ์ œ์™€ ํ•™์Šต ๋ฐฉ๋ฒ•, ๋ฐ์ดํ„ฐ์…‹์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
### 2023.09.17
- ๊ฐœ์„ ๋œ multi-task dataset์œผ๋กœ ํ•™์Šตํ•œ komt-llama2-7b-v1 ๋ชจ๋ธ์„ ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค.(๊ฐ€๋”์”ฉ end token ์ ์šฉ์ด ์•ˆ๋˜๋Š” ๋ฌธ์ œ, ๋‹ต๋ณ€์„ ๋„ˆ๋ฌด ๊ธธ๊ฒŒ ํ•˜๋Š” ๋ฌธ์ œ๋“ฑ ์ˆ˜์ •)
- [davidkim205/komt-llama2-7b-v1](https://huggingface.co/davidkim205/komt-llama2-7b-v1)
- [davidkim205/komt-llama2-7b-v1-lora](https://huggingface.co/davidkim205/komt-llama2-7b-v1-lora)
- [davidkim205/komt-llama2-7b-v1-ggml](https://huggingface.co/davidkim205/komt-llama2-7b-v1-ggml)
### 2023.08.16
- We are releasing the [davidkim205/komt-Llama-2-7b-chat-hf-ggml](https://huggingface.co/davidkim205/komt-Llama-2-7b-chat-hf-ggml) model
### 2023.08.17
- We are releasing the [davidkim205/komt-Llama-2-13b-hf-lora](https://huggingface.co/davidkim205/komt-Llama-2-13b-hf-lora) and [davidkim205/komt-Llama-2-13b-hf-ggml]https://huggingface.co/davidkim205/komt-Llama-2-13b-hf-ggml) models
## Released Model Checkpoints
### komt-llama2-7b
- [davidkim205/komt-llama2-7b-v1](https://huggingface.co/davidkim205/komt-llama2-7b-v1)
- [davidkim205/komt-llama2-7b-v1-lora](https://huggingface.co/davidkim205/komt-llama2-7b-v1-lora)
- [davidkim205/komt-llama2-7b-v1-ggml](https://huggingface.co/davidkim205/komt-llama2-7b-v1-ggml)
### komt-llama2-13b
- [davidkim205/komt-llama2-13b-v1](https://huggingface.co/davidkim205/komt-llama2-13b-v1)
- [davidkim205/komt-llama2-13b-v1-lora](https://huggingface.co/davidkim205/komt-llama2-13b-v1-lora)
- [davidkim205/komt-llama2-13b-v1-ggml](https://huggingface.co/davidkim205/komt-llama2-13b-v1-ggml)
### komt-llama-30b
- [davidkim205/komt-llama-30b-v1](https://huggingface.co/davidkim205/komt-llama-30b-v1)
- [davidkim205/komt-llama-30b-v1-lora](https://huggingface.co/davidkim205/komt-llama-30b-v1-lora)
### komt-mistral-7b
- [davidkim205/komt-mistral-7b-v1](https://huggingface.co/davidkim205/komt-mistral-7b-v1)
- [davidkim205/komt-mistral-7b-v1-lora](https://huggingface.co/davidkim205/komt-mistral-7b-v1-lora)
- [davidkim205/komt-mistral-7b-v1-gguf](https://huggingface.co/davidkim205/komt-mistral-7b-v1-gguf)
- [davidkim205/komt-mistral-7b-v1-dpo](https://huggingface.co/davidkim205/komt-mistral-7b-v1-dpo)
## Hardware and Software
- nvidia driver : 535.54.03
- CUDA Version: 12.2
## Setup
```
git clone https://github.com/davidkim205/komt.git
cd komt
conda create -n komt python=3.10
conda activate komt
pip install -r requirements.txt
```
## Usage
์šฐ๋ฆฌ๋Š” komt-llama2 ๋ชจ๋ธ์„ ์‚ฌ์šฉํ• ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
## transformers
```
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import TextStreamer, GenerationConfig
model_name='davidkim205/komt-llama2-7b-v1'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
streamer = TextStreamer(tokenizer)
def gen(x):
generation_config = GenerationConfig(
temperature=0.8,
top_p=0.8,
top_k=100,
max_new_tokens=512,
early_stopping=True,
do_sample=True,
)
q = f"### instruction: {x}\n\n### Response: "
gened = model.generate(
**tokenizer(
q,
return_tensors='pt',
return_token_type_ids=False
).to('cuda'),
generation_config=generation_config,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
streamer=streamer,
)
result_str = tokenizer.decode(gened[0])
start_tag = f"\n\n### Response: "
start_index = result_str.find(start_tag)
if start_index != -1:
result_str = result_str[start_index + len(start_tag):].strip()
return result_str
print(gen('์ œ์ฃผ๋„๋ฅผ 1๋ฐ•2์ผ๋กœ ํ˜ผ์ž ์—ฌํ–‰ํ•˜๋ ค๊ณ  ํ•˜๋Š”๋ฐ ์—ฌํ–‰ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด์ค˜'))
```
๊ฒฐ๊ณผ
```
### Response: ์ œ์ฃผ๋„๋ฅผ 1๋ฐ•2์ผ๋กœ ํ˜ผ์ž ์—ฌํ–‰ํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—ฌํ–‰ ์ฝ”์Šค๋ฅผ ๋งŒ๋“ค์–ด ๊ณ„ํšํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
1์ผ์ฐจ:
- ์•„์นจ: ์ œ์ฃผ๋„์˜ ์•„๋ฆ„๋‹ค์šด ํ•ด๋ณ€์„ ๊ตฌ๊ฒฝํ•˜๊ธฐ ์œ„ํ•ด ํ•ด๋ณ€์— ๋„์ฐฉํ•˜์„ธ์š”. ์ผ์ถœ์„ ๊ฐ์ƒํ•˜๋ฉฐ ์ž์—ฐ์˜ ์•„๋ฆ„๋‹ค์›€์„ ๋งŒ๋ฝํ•˜์„ธ์š”.
- ์˜คํ›„: ์ œ์ฃผ๋„์˜ ๋Œ€ํ‘œ์ ์ธ ๊ด€๊ด‘์ง€์ธ ํ•œ๋ผ์‚ฐ์„ ํƒํ—˜ํ•˜์„ธ์š”. ๋“ฑ์‚ฐ๋กœ๋ฅผ ๋”ฐ๋ผ ์˜ฌ๋ผ๊ฐ€๋ฉด์„œ ๊ฒฝ์น˜๋ฅผ ์ฆ๊ธฐ๊ณ  ์„ค๋ช…์„ ๋“ฃ์œผ๋ฉฐ ์‰ฌ์šด ์‚ฐ์ฑ…์„ ์ฆ๊ธฐ์„ธ์š”.
- ์ €๋…: ์ œ์ฃผ๋„์˜ ๋ง›์žˆ๋Š” ์Œ์‹์ ์—์„œ ์ €๋…์„ ๋ณด๋‚ด์„ธ์š”. ์‹ ์„ ํ•œ ํ•ด์‚ฐ๋ฌผ๊ณผ ํ–ฅ์‹ ๋ฃŒ๋กœ ๋งŒ๋“  ์Œ์‹์„ ๋ง›๋ณด๋Š” ๊ฒƒ์€ ์ œ์ฃผ๋„ ์—ฌํ–‰์˜ ์™„๋ฒฝํ•œ ๊ฒฝํ—˜์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.
2์ผ์ฐจ:
- ์•„์นจ: ํ•œ๋ผ์‚ฐ ์ผ๋Œ€๋ฅผ ํƒํ—˜ํ•˜๊ธฐ ์œ„ํ•ด ํ•œ๋ผ์‚ฐ ์ผ€์ดํ”„๋กœ ์ด๋™ํ•˜์„ธ์š”. ์ด ์ผ€์ดํ”„๋Š” ๋“ฑ์‚ฐ์„ ์ฆ๊ธฐ๋Š” ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ์ตœ์ ์˜ ์„ ํƒ์ž…๋‹ˆ๋‹ค.
```
### text-generation-webui
![text-generation-webui.gif](images%2Ftext-generation-webui.gif)
```
# text-generation-webui ์ฝ”๋“œ ๋ฐ›๊ธฐ
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui/
# conda ํ™˜๊ฒฝ์ƒ์„ฑ
conda create -n text-generation-webui python=3.10
conda activate text-generation-webui
# pip install
pip install -r requirements.txt
# model download
pip install huggingface-hub
python -c "from huggingface_hub import hf_hub_download;print(hf_hub_download(repo_id='davidkim205/komt-llama2-7b-v1-ggml', filename='ggml-model-q4_0.gguf', local_dir='./models/'))"
# server ์‹คํ–‰
python server.py
```
### llama2-webui
![llama2-webui.gif](images%2Fllama2-webui.gif)
https://github.com/liltom-eth/llama2-webui
llama2-webui๋ฅผ git cloneํ›„ requirements๋ฅผ install ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋‹ค์Œ ์šฉ๋Ÿ‰์ด ํฌ๊ธฐ๋•Œ๋ฌธ์— git lfs์„ ์ด์šฉํ•˜์—ฌ komt-llama2-7b๋ฅผ ๋‹ค์šด๋กœ๋“œ ๋ฐ›์Šต๋‹ˆ๋‹ค.
```
git clone https://github.com/liltom-eth/llama2-webui.git
cd llama2-webui
pip install -r requirements.txt
```
model์„ ๋‹ค์šด๋กœ๋“œํ›„ app์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
```
sudo apt install git-lfs
git lfs clone https://huggingface.co/davidkim205/komt-llama2-7b-v1
python app.py --backend_type transformers --model_path ./komt-llama2-7b-v1/
```
### llama.cpp
![llama.cpp-example.gif](images%2Fllama.cpp-example.gif)
```
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
pip install -r requirements.txt
pip install huggingface-hub
python -c "from huggingface_hub import hf_hub_download;print(hf_hub_download(repo_id='davidkim205/komt-llama2-7b-v1-ggml', filename='ggml-model-q4_0.gguf', local_dir='./models/'))"
make -j && ./main -m ./models/ggml-model-q4_0.gguf -p "์ธ์‚ผ์€ ์–ด๋–ค ํšจ๊ณผ๊ฐ€ ์žˆ๋Š”๊ฐ€์š”? ##output:"
```
### llama.cpp with google colab
google colab์—์„œ llama.cpp๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ komt๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
https://colab.research.google.com/drive/1uLHXv-6NT7yj4FHECrZezfo5pVL-ht63?usp=sharing
### usage_komt_with_lora
python๊ณผ jupyter๋ฅผ ์ด์šฉํ•œ ์˜ˆ์ œ์ž…๋‹ˆ๋‹ค.
- [usage_komt_with_lora.py](usage_komt_with_lora.py)
- [usage_komt_with_lora.ipynb](usage_komt_with_lora.ipynb)
```
$ python infer.py
Downloading (โ€ฆ)/adapter_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 528/528 [00:00<00:00, 5.02MB/s]
Downloading (โ€ฆ)lve/main/config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 631/631 [00:00<00:00, 4.96MB/s]
Downloading pytorch_model.bin: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 27.0G/27.0G [04:29<00:00, 100MB/s]
Downloading (โ€ฆ)neration_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 183/183 [00:00<00:00, 1.36MB/s]
Downloading adapter_model.bin: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 80.1M/80.1M [00:00<00:00, 82.7MB/s]
Downloading (โ€ฆ)okenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 749/749 [00:00<00:00, 6.66MB/s]
Downloading tokenizer.model: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 500k/500k [00:00<00:00, 111MB/s]
Downloading (โ€ฆ)in/added_tokens.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 21.0/21.0 [00:00<00:00, 131kB/s]
Downloading (โ€ฆ)cial_tokens_map.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 96.0/96.0 [00:00<00:00, 608kB/s]
/home/david/anaconda3/envs/komt/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:399: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/home/david/anaconda3/envs/komt/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:399: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
warnings.warn(
<s> ### instruction: ๊ณ ์–‘์ด๋Š” ์™œ ๋ฌผ์„ ์‹ซ์–ดํ•˜๋‚˜์š”?
### Response: ๊ณ ์–‘์ด๋Š” ์‚ฌ๋žŒ๊ณผ ๋‹ฌ๋ฆฌ ๋ฌผ์„ ์‹ซ์–ดํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฌผ์— ๋…น์•„ ์žˆ๋Š” ํ—ค์–ด์ณ๋ฐœ๊ณผ ๋ฌผ์˜ ๋ƒ„์ƒˆ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ณ ์–‘์ด๋Š” ํ—ค์–ด์ณ๋ฐœ์ด ๋ฌผ์— ๋…น์•„ ์žˆ์œผ๋ฉด ๋ฌผ์„ ๋งˆ์‹œ๊ณ  ์‹ถ์ง€ ์•Š์•„ํ•˜๋ฉฐ, ๋ฌผ์˜ ๋ƒ„์ƒˆ์—๋„ ๋ฏผ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ด์œ ๋กœ ๊ณ ์–‘์ด๋Š” ๋ฌผ์„ ์‹ซ์–ดํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
๊ณ ์–‘์ด๋Š” ์‚ฌ๋žŒ๊ณผ ๋‹ฌ๋ฆฌ ์ฒด์˜จ์ด ๋†’์•„ ์ฒด์˜จ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋งŽ์€ ์นผ๋กœ๋ฆฌ๋ฅผ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ณ ์–‘์ด๋Š” ๋ฌผ์„ ๋งˆ์‹œ์ง€ ์•Š๊ณ  ๋ฌผ์„ ์‹ซ์–ดํ•ฉ๋‹ˆ๋‹ค. ๊ณ ์–‘์ด๋Š” ์ฒด์˜จ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋ฌผ์„ ์„ญ์ทจํ•˜์ง€ ์•Š์œผ๋ฉฐ, ๋ฌผ์„ ๋งˆ์‹œ๊ณ  ์‹ถ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
๋˜ํ•œ, ๊ณ ์–‘์ด๋Š” ๋ฌผ์„ ๋งˆ์‹œ๋ฉด ์†์ด ์ฐจ๊ฐ€์›Œ์ง€๋Š” ๋“ฑ ๋ฌผ์— ๋…น์•„ ์žˆ๋Š” ํ—ค์–ด์ณ๋ฐœ ๋•Œ๋ฌธ์— ๋ฌผ์„ ์‹ซ์–ดํ•ฉ๋‹ˆ๋‹ค. ํ—ค์–ด์ณ๋ฐœ์€ ๋ฌผ์„ ๋…น์—ฌ ์†์„
๊ณ ์–‘์ด๋Š” ์‚ฌ๋žŒ๊ณผ ๋‹ฌ๋ฆฌ ๋ฌผ์„ ์‹ซ์–ดํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฌผ์— ๋…น์•„ ์žˆ๋Š” ํ—ค์–ด์ณ๋ฐœ๊ณผ ๋ฌผ์˜ ๋ƒ„์ƒˆ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ณ ์–‘์ด๋Š” ํ—ค์–ด์ณ๋ฐœ์ด ๋ฌผ์— ๋…น์•„ ์žˆ์œผ๋ฉด ๋ฌผ์„ ๋งˆ์‹œ๊ณ  ์‹ถ์ง€ ์•Š์•„ํ•˜๋ฉฐ, ๋ฌผ์˜ ๋ƒ„์ƒˆ์—๋„ ๋ฏผ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ด์œ ๋กœ ๊ณ ์–‘์ด๋Š” ๋ฌผ์„ ์‹ซ์–ดํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
๊ณ ์–‘์ด๋Š” ์‚ฌ๋žŒ๊ณผ ๋‹ฌ๋ฆฌ ์ฒด์˜จ์ด ๋†’์•„ ์ฒด์˜จ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋งŽ์€ ์นผ๋กœ๋ฆฌ๋ฅผ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ณ ์–‘์ด๋Š” ๋ฌผ์„ ๋งˆ์‹œ์ง€ ์•Š๊ณ  ๋ฌผ์„ ์‹ซ์–ดํ•ฉ๋‹ˆ๋‹ค. ๊ณ ์–‘์ด๋Š” ์ฒด์˜จ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋ฌผ์„ ์„ญ์ทจํ•˜์ง€ ์•Š์œผ๋ฉฐ, ๋ฌผ์„ ๋งˆ์‹œ๊ณ  ์‹ถ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
```
### usage komt with vllm
![vllm.gif](images%2Fvllm.gif)
vllm ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด conda ํ™˜๊ฒฝ์„ ์ƒ์„ฑํ•œํ›„์— requirements_vllm.txt์œผ๋กœ ํŒจํ‚ค์ง€๋“ค์„ ์„ค์น˜ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
```
conda create -n vllm python=3.10
conda activate vllm
pip install -r requirements_vllm.txt
```
์˜ˆ์ œ ์ฝ”๋“œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์‹คํ–‰ํ•œํ›„์— ์งˆ๋ฌธ์„ ์ž…๋ ฅํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
```
$ python usage_komt_with_vllm.py
INFO 09-25 18:48:20 llm_engine.py:72] Initializing an LLM engine with config: model='davidkim205/komt-llama2-7b-v1', tokenizer='davidkim205/komt-llama2-7b-v1', tokenizer_mode=auto, trust_remote_code=False, dtype=torch.float16, download_dir=None, load_format=auto, tensor_parallel_size=1, seed=0)
INFO 09-25 18:48:20 tokenizer.py:30] For some LLaMA-based models, initializing the fast tokenizer may take a long time. To eliminate the initialization time, consider using 'hf-internal-testing/llama-tokenizer' instead of the original tokenizer.
INFO 09-25 18:48:36 llm_engine.py:199] # GPU blocks: 1048, # CPU blocks: 512
>์ œ์ฃผ๋„ ๋ฐ์ดํŠธ ์ฝ”์Šค ์•Œ๋ ค์ค˜
Processed prompts: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1/1 [00:15<00:00, 15.30s/it]
Prompt: '### instruction: ์ œ์ฃผ๋„ ๋ฐ์ดํŠธ ์ฝ”์Šค ์•Œ๋ ค์ค˜\n\n### Response: ', Generated text: '์ œ์ฃผ๋„ ๋ฐ์ดํŠธ ์ฝ”์Šค ์•Œ๋ ค๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.\n1. ์•„์นจ์— ์ผ์ฐ ์ผ์–ด๋‚˜์„œ ์ œ์ฃผ์ƒ๊ณต์›์—์„œ ์•„์นจ ํ•ด๋‹์ด๋ฅผ ๋ณด์ฉฐ ์ธ์‚ฌ๋ฅผ ๋“œ๋ฆฝ๋‹ˆ๋‹ค.\n2. ์ƒ๊ณต์›์„ ๋Œ์•„๋‹ค๋‹ˆ๋ฉฐ ์ž์—ฐ์˜ ์•„๋ฆ„๋‹ค์›€์„ ๋งŒ๋ฝํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ์šฉ๋‘๋ณด ํญํฌ๋ฅผ ๊ฑด๋„ˆ ๋‹ค๋‹ˆ๋ฉฐ ๋ฉ‹์ง„ ๊ฒฝ์น˜๋ฅผ ๊ฐ์ƒํ•ฉ๋‹ˆ๋‹ค.\n3. ์˜คํ›„ 1์‹œ์ฏค ์ œ์ฃผ์‹œ์˜ ์œ ๋ช…ํ•œ ํ–ฅ๊ธฐ๋ฅผ ๋งก์„ ์ˆ˜ ์žˆ๋Š” ์„ฑ์‚ฐ์ผ์ถœ๋ด‰ ๊ทผ์ฒ˜ ํผ์ฆ์„ ํ’€์–ด๋ณด์„ธ์š”. ์—ฌ๊ธฐ์—์„œ๋Š” ๋…ธ๋ž˜๋ฐฉ, ์ƒคํ”„์‹ฌ ๊ฐ•์—ฐ, ์›Œ์ปคํž ์ปจ์„œํŠธ, ํ•œ๋ผ์‚ฐ์„ฑ ๋ฐœ๊ฒฌ ์—ฌ์ˆ™ ๋“ฑ ํฅ๋ฏธ๋กœ์šด ์ฒดํ—˜์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\n4. ์ œ์ฃผํŠน์œ ์˜ ๋‹ค์–‘ํ•œ ํ•ด์‚ฐ๋ฌผ (ํ•ด์ดˆ, ๊น€์น˜, ํ•ด์„ ๋“ฑ)์„ ๊ตฌ๊ฒฝํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด, ์ž์ฃผ์ง“๋„ค๋ฏธ๋‚˜ ์ œ์ฃผ์‹œ์˜ ์ „ํ†ต์‹œ์žฅ์„ ๋ฐฉ๋ฌธํ•ด๋ณด์„ธ์š”. ํ•ด์‚ฐ๋ฌผ ์‚ฌ์ฐฐ ๊ทผ์ฒ˜์— ์œ„์น˜ํ•œ ํŠน์ˆ˜์‹œ์žฅ์—์„œ๋Š” ์ œ์ฃผ๊ฐ๊ทค์„ ๋ง›๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\n5. ๋งˆ์ง€๋ง‰์œผ๋กœ ์ €๋…์—๋Š” ์„ฑ์‚ฐ์ผ์ถœ๋ด‰์—์„œ ํ•œ๋ผ์‚ฐ์˜ ์ผ์ถœ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ์ถœ์„ ๊ฐ์ƒํ•˜๋ฉฐ ๊ทธ ์•„๋ฆ„๋‹ค์›€์— ๋Œ€ํ•œ ๊ฐ์‚ฌ๋ฅผ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.\n\n์ด์ œ ์ œ์ฃผํŠน๋ณ„์˜ ๋งค๋ ฅ์„ ์ฆ๊ธฐ์‹ค ์ค€๋น„๊ฐ€ ๋˜์…จ๋‚˜์š”? ํ—›๋œ ์ผ์ƒ์—์„œ ๋ฒ—์–ด๋‚˜ ์—ฌ์œ ๋กœ์›€์„ ๋А๋‚„ ์ˆ˜ ์žˆ๋Š” ์ œ์ฃผ๋„ ๋ฐ์ดํŠธ ์ฝ”์Šค๋ฅผ ์ฆ๊ธฐ๋ณด์„ธ์š”.'
```
## Fine-tune
komt-llama2 ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋…ผ๋ฌธ๊ณผ ๋ฐฐํฌํ•œ ๋ชจ๋ธ์— ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹์ค‘ ๋ผ์ด์„ผ์Šค๊ฐ€ ์—†๋Š” KorQuAD 1.0 ๋ฐ์ดํ„ฐ์…‹์„ datasets์— ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.
๋…ผ๋ฌธ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์•„๋ž˜ Korean Multi-task Instruction Tuning ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.
### Fine-tune with lora
![finetune_with_lora.gif](images%2Ffinetune_with_lora.gif)
๋จผ์ € github์—์„œ ์ฝ”๋“œ๋ฅผ ๋ฐ›์€ํ›„ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.(์œ„ setup์ฐธ์กฐ)
finetune_with_lora.py๋Š” custom dataset์„ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.
๊ธฐ๋ณธ์ ์œผ๋กœ ์•„๋ž˜์™€ ๊ฐ™์ด argument๊ฐ€ ์—†์„๊ฒฝ์šฐ default๋กœ davidkim205/komt-llama2-7b-v1๋ชจ๋ธ์„ base๋กœ [komt_squad.json](datasets%2Fkomt_squad.json)๋กœ ํ•™์Šต์ด ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.
```
python finetune_with_lora.py
```
๋ชจ๋ธ์ด๋‚˜ dataset ์ด๋‚˜ batchsize๋“ฑ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜์ •์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
```
python finetune_with_lora.py --model_name_or_path davidkim205/komt-llama2-7b-v1 --data_path datasets/komt_squad.json --num_train_epochs 1 --per_device_train_batch_size 1 --learning_rate 1e-5
```
๋ณด๋‹ค ์ž์„ธํ•œ argument์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์€ `python finetune_with_lora.py -h` ํ™•์ธํ•˜์„ธ์š”.
#### finetune 8-bit models with Low Rank Adaption (LoRA)
finetune_with_lora.py๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ 4-bit๋กœ ์–‘์žํ™”ํ•˜์—ฌ ํ•™์Šต์„ ํ•ฉ๋‹ˆ๋‹ค.
8bit๋กœ ์–‘์žํ™”ํ• ๊ฒฝ์šฐ ์•„๋ž˜์™€ ๊ฐ™์ด ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
```
python finetune_with_lora.py --bits 8
```
### Fine-tune with deepspeed
finetune_with_ds.py์€ DeepSpeed๊ธฐ๋ฐ˜์œผ๋กœ ZeRO-3 Offload์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต์„ ํ•ฉ๋‹ˆ๋‹ค.
CPU Offloading์„ ํ†ตํ•˜์—ฌ GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ง€๋งŒ CPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋•Œ๋ฌธ์— hw ์‚ฌ์–‘์— ๋งž๊ฒŒ ์กฐ์ •์„ ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
deepspeed ํŒŒ์ผ์€ configs/deepseed_config.json์— ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
deepspeed๋ฅผ ์ด์šฉํ• ๊ฒฝ์šฐ ์•„๋ž˜์™€ ๊ฐ™์ด conda ํ™˜๊ฒฝ์„ ์ถ”๊ฐ€ํ•œ๋‹ค์Œ ํ•ด๋‹น ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```
conda create -n ds python=3.10
conda activate ds
pip install -r requirements_ds.txt
```
finetune_with_deepspeed ์‚ฌ์šฉ๋ฐฉ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
```
deepspeed finetune_with_ds.py
```
argument ์ˆ˜์ •์‹œ ์•„๋ž˜๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.
```
deepspeed finetune_with_ds.py --model_name_or_path davidkim205/komt-llama2-7b-v1 --data_path datasets/komt_squad.json --num_train_epochs 1 --per_device_train_batch_size 1 --learning_rate 1e-5 --deepspeed configs/deepspeed_config.json
```
### Fine-tune with Direct Preference Optimization (DPO)
์ƒ์šฉ์„œ๋น„์Šค๋ฅผ ์œ„ํ•œ Direct Preference Optimization๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šตํ• ์ˆ˜ ์žˆ๋„๋ก train ์ฝ”๋“œ์™€ ๋ชจ๋ธ์„ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.
DPO ํ•™์Šต์ด ์ž˜๋˜๋ ค๋ฉด SFT๋ฅผ ์ž˜ํ•ด์•ผ ํ•˜๋Š”๋ฐ ์ด๋ฏธ ํ•™์Šต๋œ komt๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜์˜€๊ณ , ๊ธฐ์กด ๋ชจ๋ธ๋Œ€๋น„ 5% ์„ฑ๋Šฅํ–ฅ์ƒ์ด ์žˆ์—ˆ์œผ๋ฉฐ ๋™์ผํ•œ ์งˆ๋ฌธ์— ๋™์ผํ•œ ๋‹ต๋ณ€์„ ํ• ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜์˜€์Šต๋‹ˆ๋‹ค.
ํ•œ๊ธ€ ๋ฐ์ดํ„ฐ์…‹์€ maywell/ko_Ultrafeedback_binarized ์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
dpo_train.py ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ requirements_dpo.txt๋ฅผ ์„ค์น˜ํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์„ค์น˜์˜ˆ์ž…๋‹ˆ๋‹ค.
```
conda create -n dpo_train python=3.10
conda activate dpo_train
pip install -r requirements_dpo.txt
```
์„ค์น˜ํ›„ `accelerate config`๋ฅผ ์ด์šฉํ•˜์—ฌ accelerate config ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
```
accelerate config
```
๊ทธ ํ›„์— accelerate launch๋ฅผ ํ†ตํ•˜์—ฌ dpo_train์„ ํ•ฉ๋‹ˆ๋‹ค.
```
accelerate launch dpo_train.py
```
A100 1๋Œ€๊ธฐ์ค€์œผ๋กœ 9์‹œ๊ฐ„ ์ •๋„ ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค.
```
warnings.warn(
0%| | 1/1000 [00:36<10:13:09, 36.83s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (1069 > 1024). Running this sequence through the model will result in indexing errors
{'loss': 0.6961, 'learning_rate': 5e-05, 'rewards/chosen': 0.004012207966297865, 'rewards/rejected': 0.007965649478137493, 'rewards/accuracies': 0.515625, 'rewards/margins': -0.003953440580517054, 'logps/rejected': -222.7124481201172, 'logps/chosen': -259.6094665527344, 'logits/rejected': -2.6427276134490967, 'logits/chosen': -2.6100172996520996, 'epoch': 0.01}
2%|โ–Š | 17/1000 [09:31<8:50:11, 32.36s/it]
```
dpo์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‹ค์Œ ๋ฌธ์„œ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”. https://arxiv.org/abs/2305.18290
## ํ‰๊ฐ€๊ฒฐ๊ณผ
chatgpt๋ฅผ ์ด์šฉํ•˜์—ฌ ์งˆ๋ฌธ๊ณผ ๋Œ€๋‹ต์—๋Œ€ํ•œ ํ‰๊ฐ€๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€ chatgpt์˜ ํ‰๊ฐ€ ๊ฒฐ๊ณผ๋Š” eval_results๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.
| model | score | average(0~5) | percentage |
|------------------------------------------|---------| ------------ |------------|
| gpt-3.5-turbo(close) | 147 | 3.97 | 79.45% |
| naver Cue(close) | 140 | 3.78 | 75.67% |
| clova X(close) | 136 | 3.67 | 73.51% |
| WizardLM-13B-V1.2(open) | 96 | 2.59 | 51.89% |
| Llama-2-7b-chat-hf(open) | 67 | 1.81 | 36.21% |
| Llama-2-13b-chat-hf(open) | 73 | 1.91 | 38.37% |
| nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70 | 1.89 | 37.83% |
| kfkas/Llama-2-ko-7b-Chat(open) | 96 | 2.59 | 51.89% |
| beomi/KoAlpaca-Polyglot-12.8B(open) | 100 | 2.70 | 54.05% |
| **komt-llama2-7b-v1 (open)(ours)** | **117** | **3.16** | **63.24%** |
| **komt-llama2-13b-v1 (open)(ours)** | **129** | **3.48** | **69.72%** |
| **komt-llama-30b-v1 (open)(ours)** | **129** | **3.16** | **63.24%** |
| **komt-mistral-7b-v1 (open)(ours)** | **131** | **3.54** | **70.81%** |
| **komt-mistral-7b-v1-dpo (open)(ours)** | **142** | **3.83** | **76.75%** |
----
# Korean Multi-task Instruction Tuning
## Abstract
With the recent success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities. However, it has become evident that these models still struggle to provide accurate responses in Korean or face challenges when generating Korean text. In this study, we introduce the multi-task instruction technique, which is based on supervised datasets from various tasks, to create training data for large language models, aiming to address these issues.
## Introduction
The recent Korean large language models, such as GPT-4-LLM, Dolly, and Vicuna, have predominantly relied on translated datasets. However, using translated datasets presents several challenges:
- Language and Cultural Differences
Languages and cultures have unique expressions, vocabularies, and grammatical structures. Using translated datasets can hinder the model's ability to understand and learn effectively due to these differences.
- Translation Errors and Semantic Distortions
Machine translations are not perfect and can introduce errors or distort the meaning of the original text. This can lead to the model learning incorrect information or failing to grasp the true meaning of the source data.
- Data Quality
The quality of translated data depends on the accuracy of the source data. If the source data is inaccurate or noisy, the translated data can suffer from the same issues.
- Word Embedding Consistency
Mapping words from different languages into a consistent embedding space can be challenging. This can result in the model failing to learn the correct relationships between words or failing to recognize semantic differences among translated words.
- Data Quantity and Diversity
Using translated foreign datasets may not provide sufficient quantity and diversity of data, depending on the language and topic domain. Obtaining the required data quantity and diversity can be challenging.
- Difficulty in Understanding Context
Translated data often fails to convey the original context accurately, making it difficult for the model to understand the real meaning and context of specific words or sentences.
- Specialized Terminology and Idiomatic Expressions
Specialized terminology and idiomatic expressions in specific fields may not be appropriately handled during translation, causing the model to perform poorly in certain subjects or domains.
- Data Bias
Translating data from various countries and cultures can introduce biases or cultural differences into the model, potentially increasing bias in the model's responses.
- Performance Degradation
When original data is translated, some information may be lost in the translation process, leading to a potential decrease in the model's performance compared to using the original data directly.
## 2. Multi-task Instruction
To address these challenges and improve dataset quality, we propose an Instruction Turning Framework (ITF) that leverages multi-task datasets and instruction tuning, inspired by Google's FLAN (Finetuned LANguage Models are zero-shot Learners) technique.
### 2.1. Multi-task Datasets
We have curated multi-task datasets based on various existing Korean datasets, specifically tailored to each task. We have avoided relying on translated datasets used in previous Korean large language models. Our dataset sources include:
- AIHub Dataset: 305,900 samples
- KISTI AI Dataset: 824,337 samples
- KorQuad Dataset: 66,181 samples
- Miscellaneous Datasets: 346,803 samples
- Total Dataset Size: 1,543,221 samples
### 2.2. Instruction Tuning
Our ITF incorporates the instruction tuning technique proposed by Google's FLAN, resulting in improved zero-shot performance.
We have publicly released the freely licensed KorQuad 1.0 dataset on GitHub. However, due to licensing policies, we cannot release the other datasets.
## 3. Evaluation
For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
| model | score | average(0~5) | percentage |
| --------------------------------------- |---------| ------------ | ---------- |
| gpt-3.5-turbo(close) | 147 | 3.97 | 79.45% |
| naver Cue(close) | 140 | 3.78 | 75.67% |
| clova X(close) | 136 | 3.67 | 73.51% |
| WizardLM-13B-V1.2(open) | 96 | 2.59 | 51.89% |
| Llama-2-7b-chat-hf(open) | 67 | 1.81 | 36.21% |
| Llama-2-13b-chat-hf(open) | 73 | 1.91 | 38.37% |
| nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70 | 1.89 | 37.83% |
| kfkas/Llama-2-ko-7b-Chat(open) | 96 | 2.59 | 51.89% |
| beomi/KoAlpaca-Polyglot-12.8B(open) | 100 | 2.70 | 54.05% |
| **komt-llama2-7b-v1 (open)(ours)** | **117** | **3.16** | **63.24%** |
| **komt-llama2-13b-v1 (open)(ours)** | **129** | **3.48** | **69.72%** |
| **komt-llama-30b-v1 (open)(ours)** | **129** | **3.16** | **63.24%** |
| **komt-mistral-7b-v1 (open)(ours)** | **131** | **3.54** | **70.81%** |
## 4. Conclusion
In this study, we have proposed a method to optimize the Llama2 model for the Korean language. Experimental results demonstrate that the use of multi-task instruction outperforms other Korean-supporting Llama2 models, showcasing its superior performance. Furthermore, multi-task instruction exhibits excellent performance.
In future research, we plan to leverage multi-task instruction to develop various service models and applications.
---
# References
### Llama 2
https://github.com/facebookresearch/llama
### Llama 1
https://github.com/facebookresearch/llama/tree/llama_v1
### llama.cpp
https://github.com/ggerganov/llama.cpp