EXAONE-7.8B-Kimi-Student ๐ค
ํ๊ตญ์ด ์์ด๋ ์บ๋ฆญํฐ ๋ํ์ ํนํ๋ EXAONE-7.8B ๋ชจ๋ธ. Kimi K2 Teacher๋ก๋ถํฐ thinking ๋ฅ๋ ฅ์ ์ฆ๋ฅํ์ฌ ํ์ต.
๐ Model Overview
| ํญ๋ชฉ | ์ค๋ช |
|---|---|
| Base Model | LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct |
| Teacher Model | moonshotai/Kimi-K2-Instruct-0905 |
| Training Method | Knowledge Distillation (SFT with LoRA) |
| Dataset | developer-lunark/kimi-idol-dataset |
| Training Samples | 847 (Train), 95 (Eval) |
| Parameters | 7.8B (Base) + ~70M (LoRA) |
| Best Eval Loss | 0.9604 (Checkpoint 50) |
๐ฏ Characters
5๋ช ์ ๋ ํนํ ์์ด๋ ์บ๋ฆญํฐ ๋ํ๊ฐ ๊ฐ๋ฅํฉ๋๋ค:
| ์บ๋ฆญํฐ | ์ฑ๊ฒฉ | ๋งํฌ ํน์ง | ๋ฐ๋น ๋น์จ |
|---|---|---|---|
| ๊ฐ์จ | ๋ฐ๊ณ ํ๋ฐ, ์ฅ๋๊ธฐ, ์ ๊ต | ๋ฐ๋ง, ~ํด!, ~์ง~, ํํ |
30:70 |
| ์์ด์ | ์ฐจ๋ถํจ, ์ ๋น๋ก์, ๋ฐฐ๋ ค์ฌ | ์กด๋๋ง, ~์, ~๋ค์ |
20:80 |
| ์ด์งํ | ์ธค๋ฐ๋ , ์์กด์ฌ, ์๊ทผํ ์ฑ๊น | ๋ฐ๋ง, ๋ญ์ผ, ์๋๊ฑฐ๋ , ํฅ |
30:70 |
| ์ฐจ๋ํ | ์นด๋ฆฌ์ค๋ง, ๋ฆฌ๋์ญ, ๋ค์ ํจ | ๋ฐ๋ง, ์์ ๊ฐ, ~ํ์, ~ํด๋ณผ๊น |
50:50 |
| ์ต๋ฏผ | ์ ๊ทน์ , ์์ง, ์ด์ ์ | ๋ฐ๋ง, ์ง์งํ, ~ํ ๋?, ์ข์! |
60:40 |
๐ง Thinking Capability
๋ชจ๋ธ์ ` ํ๊ทธ๋ฅผ ์ฌ์ฉํ์ฌ 6๋จ๊ณ ์ฌ๊ณ ๊ณผ์ ์ ์ํํฉ๋๋ค:
1. Context ์ธ์ - ์ํฉ ๋ถ์
2. Relationship ํ์
- ํฌ๊ณผ์ ๊ด๊ณ ์ดํด
3. Character ์ผ๊ด์ฑ - ์บ๋ฆญํฐ ์ฑ๊ฒฉ ๋ฐ์
4. Push/Pull ํ๋จ - ๋ฐ๋น ์ ๋ต ๊ฒฐ์
5. Policy ์ค์ - ๋ํ ๋ฐฉํฅ ์ค์
6. Response ์์ฑ - ์ต์ข
์๋ต
์์
messages = [
{"role": "system", "content": "๋น์ ์ ์์ด๋ '์์ด์'์
๋๋ค."},
{"role": "user", "content": "์ค๋ ํ๋ค์์ด... ์๋กํด์ค"}
]
output = model.generate(messages)
์๋ต:
...ํ๋ค์๋ค๊ณ ํ๋ค. ๋ง์์๋ ๋น๊ฐ ๋ด๋ฆฌ๊ณ ์๋ ๋ด. ์กฐ์ฌ์ค๋ฝ๊ฒ ๋ฌผ์ด๋ด์ผ๊ฒ ์ด.
๋ด๊ฐ ๋ญ ํด์ค ์ ์์์ง๋ ๋ชจ๋ฅด๊ฒ ์ง๋ง... ๊ทธ๋๋ ๊ณ์ ์์ด์ฃผ๋ ๊ฒ๋ง์ผ๋ก๋ ์๋ก๊ฐ ๋ ์ ์์๊น? ๋ง์ด ์ง์ณค์ ํ
๋ฐ, ๋ด๊ฐ ๋๋ฌด ๊น์ด ํ๊ณ ๋ค๋ฉด ๋ถ๋ด์ค๋ฌ์ธ ์๋ ์๊ฒ ์ง... ๊ด์ฐฎ์์ง ๋๊น์ง, ์ฌ๊ธฐ ์์๊ฒ์....
โ๏ธ Training Details
Hyperparameters
learning_rate: 2e-5
epochs: 3
batch_size: 4
gradient_accumulation_steps: 4
lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
target_modules:
- q_proj
- k_proj
- v_proj
- out_proj
- c_fc_0
- c_fc_1
- c_proj
max_grad_norm: 1.0
warmup_ratio: 0.03
lr_scheduler_type: cosine
Training Performance
- Best Checkpoint: 50 steps
- Best Eval Loss: 0.9604 (better than Qwen2.5-7B: 1.0726)
- Training Time: ~30 minutes (H200)
Model Comparison
| ๋ชจ๋ธ | Eval Loss | ์๋ ์ฑ๋ฅ |
|---|---|---|
| EXAONE-7.8B-Kimi-Student | 0.9604 | โ Best |
| Qwen2.5-7B-Kimi-Student | 1.0726 | -10% worse |
๐ Usage
Basic Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "developer-lunark/exaone-7.8b-kimi-student"
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Prepare messages
messages = [
{
"role": "system",
"content": "๋น์ ์ ์์ด๋ '์์ด์'์
๋๋ค.\n\n## ์บ๋ฆญํฐ\n- ์ค๋ช
: ์ฐจ๋ถํ๊ณ ์ ๋น๋ก์ด ๋ถ์๊ธฐ. ๋ง์๊ฐ ์ ์ง๋ง ๊น์ ๊ฐ์ .\n- ์ฑ๊ฒฉ: ์ฐจ๋ถํจ, ์ ๋น๋ก์, ๋ฐฐ๋ ค์ฌ\n- ๋งํฌ: ์กด๋๋ง ํผ์ฉ, ์กฐ์ฉํ ๋งํฌ, ~์ ~๋ค์ ์ฌ์ฉ\n- ๋ฐ๋น ๋น์จ: 20:80"
},
{
"role": "user",
"content": "์ค๋ ๋ง์ด ํ๋ค์์ด..."
}
]
# Generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.8,
top_p=0.95,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Loading with LoRA
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "developer-lunark/exaone-7.8b-kimi-student")
๐ Evaluation
Format Compliance
- Think Tag Inclusion: 100% (Training Data)
- Character Consistency: To be evaluated
- Push-Pull Dynamics: To be evaluated
Model Strengths
- Lower Loss: Eval loss 0.9604, Qwen2.5-7B๋ณด๋ค 10% ๋ฎ์
- Strong Korean Understanding: EXAONE์ ํ๊ตญ์ด ํนํ ํ์ต ํจ๊ณผ
- Good Character Consistency: ๊ฐ ์บ๋ฆญํฐ์ ๋งํฌ ์ ๊ตฌํ
Known Limitations
Guardrail Violations: Training data contains ~7% guardrail violations (e.g., "ํฌ๋ถ", "์ฌ๋ํด")
- Model may occasionally use these expressions
- Filtering recommended for production use
Response Length: Average 155 characters, may be short for some scenarios
Domain Specific: Optimized for idol-fan relationship, may not generalize to other domains
๐ Citation
@model{kimi_student_exaone_2026,
title={EXAONE-7.8B-Kimi-Student: Korean Idol Character Chat Model},
author={KAIIdol Project},
year={2026},
url={https://huggingface.co/developer-lunark/exaone-7.8b-kimi-student}
}
๐ License
MIT License
๐ Acknowledgments
- Teacher Model: moonshotai/Kimi-K2-Instruct-0905
- Base Model: LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
- Training Framework: PEFT, Transformers
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for developer-lunark/exaone-7.8b-kimi-student
Base model
LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct