Update README.md
Browse files
README.md
CHANGED
|
@@ -10,14 +10,54 @@ tags:
|
|
| 10 |
license: apache-2.0
|
| 11 |
language:
|
| 12 |
- en
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
-
#
|
| 16 |
|
| 17 |
-
|
| 18 |
-
- **License:** apache-2.0
|
| 19 |
-
- **Finetuned from model :** unsloth/qwen2.5-coder-14b-instruct-bnb-4bit
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
|
|
|
| 10 |
license: apache-2.0
|
| 11 |
language:
|
| 12 |
- en
|
| 13 |
+
- ko
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# Qwen2.5 Korean Code Review LLM
|
| 17 |
|
| 18 |
+
[](https://github.com/unslothai/unsloth)
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
## Overview
|
| 21 |
+
This model is a fine-tuned version of [`unsloth/qwen2.5-coder-14b-instruct-bnb-4bit`](https://huggingface.co/unsloth/qwen2.5-coder-14b-instruct-bnb-4bit). It is optimized for Korean-language code reviews and programming education.
|
| 22 |
+
|
| 23 |
+
The model was trained using [ewhk9887/korean_code_reviews_from_github](https://huggingface.co/datasets/ewhk9887/korean_code_reviews_from_github), a dataset consisting of Korean code reviews collected from GitHub. The fine-tuning process was done using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's `transformers` and `trl` libraries, enabling a **2x faster** training process.
|
| 24 |
+
|
| 25 |
+
## ๋ชจ๋ธ ๊ฐ์
|
| 26 |
+
์ด ๋ชจ๋ธ์ [`unsloth/qwen2.5-coder-14b-instruct-bnb-4bit`](https://huggingface.co/unsloth/qwen2.5-coder-14b-instruct-bnb-4bit)๋ฅผ ํ์ธํ๋ํ ๋ฒ์ ์ผ๋ก, **ํ๊ตญ์ด ์ฝ๋ ๋ฆฌ๋ทฐ ๋ฐ ์ฝ๋ฉ ํ์ต**์ ์ํ ์ต์ ํ๋ฅผ ๊ฑฐ์ณค์ต๋๋ค.
|
| 27 |
+
|
| 28 |
+
[GitHub์์ ์์ง๋ ์ฝ๋ ๋ฆฌ๋ทฐ ๋ฐ์ดํฐ์
](https://huggingface.co/datasets/ewhk9887/korean_code_reviews_from_github)์ ์ฌ์ฉํ์ฌ ํ์ตํ์ผ๋ฉฐ, [Unsloth](https://github.com/unslothai/unsloth) ๋ฐ Hugging Face์ `transformers`, `trl` ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ํ์ฉํ์ฌ **2๋ฐฐ ๋น ๋ฅธ** ํ์ต์ ๊ฐ๋ฅํ๊ฒ ํ์ต๋๋ค.
|
| 29 |
+
|
| 30 |
+
## Features / ํน์ง
|
| 31 |
+
- **Korean Code Review Support**: Designed specifically for analyzing and reviewing code in Korean.
|
| 32 |
+
- **Efficient Fine-Tuning**: Utilized `bnb-4bit` quantization and Unsloth for optimized performance.
|
| 33 |
+
- **Bilingual Support**: Can process both Korean and English inputs.
|
| 34 |
+
- **Transformer-based Model**: Leverages Qwen2.5's strong coding capabilities.
|
| 35 |
+
|
| 36 |
+
- **ํ๊ตญ์ด ์ฝ๋ ๋ฆฌ๋ทฐ ์ต์ ํ**: ์ฝ๋ ๋ฆฌ๋ทฐ๋ฅผ ํ๊ตญ์ด๋ก ๋ถ์ํ๊ณ ์์ฑํ๋ ๋ฐ ์ต์ ํ๋์์ต๋๋ค.
|
| 37 |
+
- **ํจ์จ์ ์ธ ํ์ธํ๋**: `bnb-4bit` ์์ํ ๋ฐ Unsloth ๊ธฐ์ ์ ํ์ฉํ์ฌ ๋น ๋ฅธ ํ์ต์ด ๊ฐ๋ฅํ์ต๋๋ค.
|
| 38 |
+
- **ํ์ ์ง์**: ํ๊ตญ์ด์ ์์ด ์
๋ ฅ์ ๋ชจ๋ ์ฒ๋ฆฌํ ์ ์์ต๋๋ค.
|
| 39 |
+
- **๊ฐ๋ ฅํ ํธ๋์คํฌ๋จธ ๊ธฐ๋ฐ**: Qwen2.5 ๋ชจ๋ธ์ ํ์ฉํ ์ฝ๋ ๋ถ์ ์ฑ๋ฅ.
|
| 40 |
+
|
| 41 |
+
## Usage / ์ฌ์ฉ ๋ฐฉ๋ฒ
|
| 42 |
+
```python
|
| 43 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 44 |
+
import torch
|
| 45 |
+
|
| 46 |
+
model_name = "ewhk9887/qwen2.5-korean-code-review"
|
| 47 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 48 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
|
| 49 |
+
|
| 50 |
+
inputs = tokenizer("์ฝ๋๋ฅผ ๋ฆฌ๋ทฐํด ์ฃผ์ธ์: def add(a, b): return a + b", return_tensors="pt").to("cuda")
|
| 51 |
+
outputs = model.generate(**inputs, max_new_tokens=100)
|
| 52 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
## Developer / ๊ฐ๋ฐ์
|
| 56 |
+
- **Name**: ์์์ (Eunsoo Max Eun)
|
| 57 |
+
- **License**: Apache-2.0
|
| 58 |
+
|
| 59 |
+
## Acknowledgments / ์ฐธ๊ณ ์๋ฃ
|
| 60 |
+
- [Unsloth](https://github.com/unslothai/unsloth)
|
| 61 |
+
- [Hugging Face TRL](https://huggingface.co/docs/trl)
|
| 62 |
+
- [Dataset Used](https://huggingface.co/datasets/ewhk9887/korean_code_reviews_from_github)
|
| 63 |
|
|
|