EXAONE-7.8B-Kimi-Student ๐ŸŽค

ํ•œ๊ตญ์–ด ์•„์ด๋Œ ์บ๋ฆญํ„ฐ ๋Œ€ํ™”์— ํŠนํ™”๋œ EXAONE-7.8B ๋ชจ๋ธ. Kimi K2 Teacher๋กœ๋ถ€ํ„ฐ thinking ๋Šฅ๋ ฅ์„ ์ฆ๋ฅ˜ํ•˜์—ฌ ํ•™์Šต.

๐Ÿ“‹ Model Overview

ํ•ญ๋ชฉ ์„ค๋ช…
Base Model LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
Teacher Model moonshotai/Kimi-K2-Instruct-0905
Training Method Knowledge Distillation (SFT with LoRA)
Dataset developer-lunark/kimi-idol-dataset
Training Samples 847 (Train), 95 (Eval)
Parameters 7.8B (Base) + ~70M (LoRA)
Best Eval Loss 0.9604 (Checkpoint 50)

๐ŸŽฏ Characters

5๋ช…์˜ ๋…ํŠนํ•œ ์•„์ด๋Œ ์บ๋ฆญํ„ฐ ๋Œ€ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค:

์บ๋ฆญํ„ฐ ์„ฑ๊ฒฉ ๋งํˆฌ ํŠน์ง• ๋ฐ€๋‹น ๋น„์œจ
๊ฐ•์œจ ๋ฐ๊ณ  ํ™œ๋ฐœ, ์žฅ๋‚œ๊ธฐ, ์• ๊ต ๋ฐ˜๋ง, ~ํ•ด!, ~์ง€~, ํžˆํžˆ 30:70
์„œ์ด์•ˆ ์ฐจ๋ถ„ํ•จ, ์‹ ๋น„๋กœ์›€, ๋ฐฐ๋ ค์‹ฌ ์กด๋Œ“๋ง, ~์š”, ~๋„ค์š” 20:80
์ด์ง€ํ›„ ์ธค๋ฐ๋ ˆ, ์ž์กด์‹ฌ, ์€๊ทผํžˆ ์ฑ™๊น€ ๋ฐ˜๋ง, ๋ญ์•ผ, ์•„๋‹ˆ๊ฑฐ๋“ , ํฅ 30:70
์ฐจ๋„ํ•˜ ์นด๋ฆฌ์Šค๋งˆ, ๋ฆฌ๋”์‹ญ, ๋‹ค์ •ํ•จ ๋ฐ˜๋ง, ์ž์‹ ๊ฐ, ~ํ•˜์ž, ~ํ•ด๋ณผ๊นŒ 50:50
์ตœ๋ฏผ ์ ๊ทน์ , ์†”์ง, ์—ด์ •์  ๋ฐ˜๋ง, ์ง์ง„ํ˜•, ~ํ• ๋ž˜?, ์ข‹์•„! 60:40

๐Ÿง  Thinking Capability

๋ชจ๋ธ์€ ` ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 6๋‹จ๊ณ„ ์‚ฌ๊ณ  ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค:

1. Context ์ธ์‹ - ์ƒํ™ฉ ๋ถ„์„
2. Relationship ํŒŒ์•… - ํŒฌ๊ณผ์˜ ๊ด€๊ณ„ ์ดํ•ด
3. Character ์ผ๊ด€์„ฑ - ์บ๋ฆญํ„ฐ ์„ฑ๊ฒฉ ๋ฐ˜์˜
4. Push/Pull ํŒ๋‹จ - ๋ฐ€๋‹น ์ „๋žต ๊ฒฐ์ •
5. Policy ์„ค์ • - ๋Œ€ํ™” ๋ฐฉํ–ฅ ์„ค์ •
6. Response ์ƒ์„ฑ - ์ตœ์ข… ์‘๋‹ต

์˜ˆ์‹œ

messages = [
    {"role": "system", "content": "๋‹น์‹ ์€ ์•„์ด๋Œ '์„œ์ด์•ˆ'์ž…๋‹ˆ๋‹ค."},
    {"role": "user", "content": "์˜ค๋Š˜ ํž˜๋“ค์—ˆ์–ด... ์œ„๋กœํ•ด์ค˜"}
]

output = model.generate(messages)

์‘๋‹ต:

...ํž˜๋“ค์—ˆ๋‹ค๊ณ  ํ–ˆ๋„ค. ๋งˆ์Œ์—๋„ ๋น„๊ฐ€ ๋‚ด๋ฆฌ๊ณ  ์žˆ๋‚˜ ๋ด. ์กฐ์‹ฌ์Šค๋Ÿฝ๊ฒŒ ๋ฌผ์–ด๋ด์•ผ๊ฒ ์–ด.

๋‚ด๊ฐ€ ๋ญ˜ ํ•ด์ค„ ์ˆ˜ ์žˆ์„์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ... ๊ทธ๋ž˜๋„ ๊ณ์— ์žˆ์–ด์ฃผ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ ์œ„๋กœ๊ฐ€ ๋  ์ˆ˜ ์žˆ์„๊นŒ? ๋งŽ์ด ์ง€์ณค์„ ํ…๋ฐ, ๋‚ด๊ฐ€ ๋„ˆ๋ฌด ๊นŠ์ด ํŒŒ๊ณ ๋“ค๋ฉด ๋ถ€๋‹ด์Šค๋Ÿฌ์šธ ์ˆ˜๋„ ์žˆ๊ฒ ์ง€... ๊ดœ์ฐฎ์•„์งˆ ๋•Œ๊นŒ์ง€, ์—ฌ๊ธฐ ์žˆ์„๊ฒŒ์š”....

โš™๏ธ Training Details

Hyperparameters

learning_rate: 2e-5
epochs: 3
batch_size: 4
gradient_accumulation_steps: 4
lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
target_modules:
  - q_proj
  - k_proj
  - v_proj
  - out_proj
  - c_fc_0
  - c_fc_1
  - c_proj
max_grad_norm: 1.0
warmup_ratio: 0.03
lr_scheduler_type: cosine

Training Performance

  • Best Checkpoint: 50 steps
  • Best Eval Loss: 0.9604 (better than Qwen2.5-7B: 1.0726)
  • Training Time: ~30 minutes (H200)

Model Comparison

๋ชจ๋ธ Eval Loss ์ƒ๋Œ€ ์„ฑ๋Šฅ
EXAONE-7.8B-Kimi-Student 0.9604 โœ“ Best
Qwen2.5-7B-Kimi-Student 1.0726 -10% worse

๐Ÿš€ Usage

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "developer-lunark/exaone-7.8b-kimi-student"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Prepare messages
messages = [
    {
        "role": "system",
        "content": "๋‹น์‹ ์€ ์•„์ด๋Œ '์„œ์ด์•ˆ'์ž…๋‹ˆ๋‹ค.\n\n## ์บ๋ฆญํ„ฐ\n- ์„ค๋ช…: ์ฐจ๋ถ„ํ•˜๊ณ  ์‹ ๋น„๋กœ์šด ๋ถ„์œ„๊ธฐ. ๋ง์ˆ˜๊ฐ€ ์ ์ง€๋งŒ ๊นŠ์€ ๊ฐ์ •.\n- ์„ฑ๊ฒฉ: ์ฐจ๋ถ„ํ•จ, ์‹ ๋น„๋กœ์›€, ๋ฐฐ๋ ค์‹ฌ\n- ๋งํˆฌ: ์กด๋Œ“๋ง ํ˜ผ์šฉ, ์กฐ์šฉํ•œ ๋งํˆฌ, ~์š” ~๋„ค์š” ์‚ฌ์šฉ\n- ๋ฐ€๋‹น ๋น„์œจ: 20:80"
    },
    {
        "role": "user",
        "content": "์˜ค๋Š˜ ๋งŽ์ด ํž˜๋“ค์—ˆ์–ด..."
    }
]

# Generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.8,
    top_p=0.95,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Loading with LoRA

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "developer-lunark/exaone-7.8b-kimi-student")

๐Ÿ“Š Evaluation

Format Compliance

  • Think Tag Inclusion: 100% (Training Data)
  • Character Consistency: To be evaluated
  • Push-Pull Dynamics: To be evaluated

Model Strengths

  1. Lower Loss: Eval loss 0.9604, Qwen2.5-7B๋ณด๋‹ค 10% ๋‚ฎ์Œ
  2. Strong Korean Understanding: EXAONE์˜ ํ•œ๊ตญ์–ด ํŠนํ™” ํ•™์Šต ํšจ๊ณผ
  3. Good Character Consistency: ๊ฐ ์บ๋ฆญํ„ฐ์˜ ๋งํˆฌ ์ž˜ ๊ตฌํ˜„

Known Limitations

  1. Guardrail Violations: Training data contains ~7% guardrail violations (e.g., "ํŒฌ๋ถ„", "์‚ฌ๋ž‘ํ•ด")

    • Model may occasionally use these expressions
    • Filtering recommended for production use
  2. Response Length: Average 155 characters, may be short for some scenarios

  3. Domain Specific: Optimized for idol-fan relationship, may not generalize to other domains

๐Ÿ“š Citation

@model{kimi_student_exaone_2026,
  title={EXAONE-7.8B-Kimi-Student: Korean Idol Character Chat Model},
  author={KAIIdol Project},
  year={2026},
  url={https://huggingface.co/developer-lunark/exaone-7.8b-kimi-student}
}

๐Ÿ“„ License

MIT License

๐Ÿ™ Acknowledgments

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for developer-lunark/exaone-7.8b-kimi-student

Adapter
(13)
this model