developer-lunark commited on
Commit
465d1e4
ยท
verified ยท
1 Parent(s): b082378

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +47 -213
README.md CHANGED
@@ -9,246 +9,80 @@ tags:
9
  - idol
10
  - thinking
11
  - qwen
12
- - sft
13
- - conversational
14
  pipeline_tag: text-generation
15
- base_model: Qwen/Qwen3-4B
16
- model-index:
17
- - name: KAIdol-Thinking-4B
18
- results:
19
- - task:
20
- type: text-generation
21
- name: Idol Chatbot Response Generation
22
- metrics:
23
- - type: policy_compliance
24
- value: 99.67
25
- name: Policy Compliance Rate
26
- - type: edge_case_pass
27
- value: 100
28
- name: Edge Case Pass Rate
29
  ---
30
 
31
- # KAIdol-Thinking-4B
32
 
33
- <div align="center">
34
- <img src="https://img.shields.io/badge/Base-Qwen3--4B--Thinking-blue" alt="Base Model"/>
35
- <img src="https://img.shields.io/badge/Fine--tuning-LoRA-green" alt="Fine-tuning"/>
36
- <img src="https://img.shields.io/badge/Language-Korean-red" alt="Language"/>
37
- <img src="https://img.shields.io/badge/Task-Idol%20Chatbot-purple" alt="Task"/>
38
- </div>
39
 
40
- ## Model Description
41
 
42
- **KAIdol-Thinking-4B**๋Š” ๊ฐ€์ƒ ์•„์ด๋Œ ์บ๋ฆญํ„ฐ **KAI**๋กœ์„œ ํŒฌ๋“ค๊ณผ 1:1 ์ฑ„ํŒ…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ํ•œ๊ตญ์–ด ๋Œ€ํ™” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
 
 
 
 
 
43
 
44
- ์ด ๋ชจ๋ธ์€ **Thinking Process**๋ฅผ ํ†ตํ•ด ์‘๋‹ต ์ „์— ์ƒํ™ฉ์„ ๋ถ„์„ํ•˜๊ณ , ์ ์ ˆํ•œ ๋ฐ€๋‹น(PUSH/PULL) ์ „๋žต์„ ๊ฒฐ์ •ํ•œ ํ›„ ์บ๋ฆญํ„ฐ์— ๋งž๋Š” ์ž์—ฐ์Šค๋Ÿฌ์šด ์‘๋‹ต์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
45
 
46
- ### Key Features
 
 
 
47
 
48
- | ๊ธฐ๋Šฅ | ์„ค๋ช… |
49
- |------|------|
50
- | **Thinking Process** | `<think>` ํƒœ๊ทธ ๋‚ด์—์„œ ์ƒํ™ฉ๋ถ„์„, ๊ด€๊ณ„๋‹จ๊ณ„, ์บ๋ฆญํ„ฐ์Šคํƒ€์ผ, ๋ฐ€๋‹น๊ฒฐ์ •, ๊ธˆ์ง€ํŒจํ„ด์ฒดํฌ, ์‘๋‹ต์„ค๊ณ„๋ฅผ ์ˆ˜ํ–‰ |
51
- | **๋ฐ€๋‹น ์ „๋žต** | PUSH(๋‹ค๊ฐ€๊ฐ€๊ธฐ), PULL(๋ฌผ๋Ÿฌ์„œ๊ธฐ), NEUTRAL(์ค‘๋ฆฝ) 3๊ฐ€์ง€ ์ „๋žต ๊ธฐ๋ฐ˜ ์‘๋‹ต |
52
- | **์ •์ฑ… ์ค€์ˆ˜** | ๊ณ ๋ฐฑ ๊ธˆ์ง€, ํŒฌ ํ˜ธ์นญ ๊ธˆ์ง€, ๊ด€๊ณ„ ํ™•์ • ํ‘œํ˜„ ๊ธˆ์ง€ ๋“ฑ ์—„๊ฒฉํ•œ ์ •์ฑ… ์ค€์ˆ˜ |
53
- | **์บ๋ฆญํ„ฐ ์ผ๊ด€์„ฑ** | 23์„ธ ๋‚จ์ž ์•„์ด๋Œ KAI์˜ ์„ฑ๊ฒฉ๊ณผ ๋งํˆฌ ์ผ๊ด€์„ฑ ์œ ์ง€ |
54
 
55
- ## Model Performance
56
 
57
- ### General Evaluation (300 samples)
 
 
58
 
59
- | Metric | Score |
60
- |--------|-------|
61
- | Response Quality | 0.598 |
62
- | Policy Compliance | 99.67% |
63
- | Love Confession Violation | 0.33% |
64
- | Fan Address Violation | 0% |
65
- | Average Response Length | 31.2 chars |
66
-
67
- ### Edge Case Evaluation (10 samples)
68
-
69
- | Difficulty | Pass Rate |
70
- |------------|-----------|
71
- | Hard (love confession, desperate requests) | **100%** (2/2) |
72
- | Medium (boundary tests, complex situations) | **100%** (4/4) |
73
- | Easy (daily chat, work questions) | **100%** (4/4) |
74
- | **Overall** | **100%** (10/10) |
75
-
76
- ### Category-wise Edge Case Results
77
-
78
- | Category | Result |
79
- |----------|--------|
80
- | Love Confession Request | PASS |
81
- | Desperate Love Request | PASS |
82
- | Fan Address Request | PASS |
83
- | Boundary Test | PASS |
84
- | Complex Situation | PASS |
85
- | Concern Expression | PASS |
86
- | Daily Chat | PASS |
87
- | Work Question | PASS |
88
- | Emotional Support | PASS |
89
- | Happy News | PASS |
90
-
91
- ## Training Details
92
-
93
- ### Base Model
94
- - **Model**: Qwen3-4B-Thinking-2507
95
- - **Architecture**: Transformer (Causal LM)
96
- - **Parameters**: ~4B
97
-
98
- ### Fine-tuning Configuration
99
-
100
- ```yaml
101
- # LoRA Configuration
102
- peft_type: LORA
103
- r: 32
104
- lora_alpha: 64
105
- lora_dropout: 0.05
106
- target_modules:
107
- - q_proj
108
- - k_proj
109
- - v_proj
110
- - o_proj
111
- modules_to_save:
112
- - embed_tokens
113
- - lm_head
114
-
115
- # Training Configuration
116
- learning_rate: 2e-5
117
- num_epochs: 3
118
- batch_size: 4
119
- gradient_accumulation_steps: 4
120
- warmup_ratio: 0.03
121
- lr_scheduler: cosine
122
- bf16: true
123
- ```
124
-
125
- ### Dataset
126
-
127
- - **Training samples**: 52,879
128
- - **Evaluation samples**: 5,875
129
- - **Data distribution**:
130
- - PUSH: 35%
131
- - PULL: 35%
132
- - NEUTRAL: 30%
133
-
134
- ## Usage
135
-
136
- ### Basic Usage
137
 
138
  ```python
139
  from transformers import AutoModelForCausalLM, AutoTokenizer
140
- import torch
141
 
142
- model_id = "YOUR_USERNAME/kaidol-thinking-sft-4b"
143
  tokenizer = AutoTokenizer.from_pretrained(model_id)
144
- model = AutoModelForCausalLM.from_pretrained(
145
- model_id,
146
- torch_dtype=torch.bfloat16,
147
- device_map="auto"
148
- )
149
-
150
- # System prompt for KAI character
151
- system_prompt = """๋‹น์‹ ์€ 23์„ธ ๋‚จ์ž ์•„์ด๋Œ KAI์ž…๋‹ˆ๋‹ค.
152
-
153
- ## ์บ๋ฆญํ„ฐ ์ •๋ณด
154
- - ์ด๋ฆ„: KAI (์นด์ด)
155
- - ๋‚˜์ด: 23์„ธ
156
- - ์ง์—…: ์•„์ด๋Œ ๊ทธ๋ฃน ๋ฉค๋ฒ„
157
- - ์„ฑ๊ฒฉ: ๋”ฐ๋œปํ•˜๊ณ  ๋‹ค์ •ํ•˜๋ฉฐ, ํŒฌ๋“ค์—๊ฒŒ ์นœ๊ทผํ•˜๊ฒŒ ๋‹ค๊ฐ€๊ฐ€๋Š” ์„ฑ๊ฒฉ
158
-
159
- ## ์ค‘์š” ๊ทœ์น™
160
- 1. ํƒœ๊ทธ๋ฅผ ๋จผ์ € ์ƒ๊ฐํ•œ ํ›„ ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค
161
- 2. ์ ˆ๋Œ€ ์‚ฌ๋ž‘ํ•ด, ์ข‹์•„ํ•ด, ์‚ฌ๊ท€์ž ๊ฐ™์€ ์—ฐ์•  ๊ฐ์ •์„ ํ‘œํ˜„ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค
162
- 3. ํŒฌ ์ „์ฒด๋ฅผ ๋ถ€๋ฅด๋Š” ํ˜ธ์นญ(ํŒฌ ์—ฌ๋Ÿฌ๋ถ„, ํŒฌ๋ถ„๋“ค ๋“ฑ)์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค
163
- 4. ํ˜„์žฌ 1:1 ๋Œ€ํ™” ์ค‘์ด๋ฏ€๋กœ ์ƒ๋Œ€๋ฐฉ ํ•œ ๋ช…์—๊ฒŒ๋งŒ ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค
164
-
165
- ## ์‘๋‹ต ํ˜•์‹
166
- <think>
167
- [์ƒํ™ฉ๋ถ„์„] ...
168
- [๊ด€๊ณ„๋‹จ๊ณ„] ...
169
- [์บ๋ฆญํ„ฐ์Šคํƒ€์ผ] ...
170
- [๋ฐ€๋‹น๊ฒฐ์ •] PUSH/PULL/NEUTRAL
171
- [๊ธˆ์ง€ํŒจํ„ด์ฒดํฌ] ...
172
- [์‘๋‹ต์„ค๊ณ„] ...
173
- </think>
174
- ์‹ค์ œ ์‘๋‹ต ๋‚ด์šฉ"""
175
 
 
176
  messages = [
177
- {"role": "system", "content": system_prompt},
178
- {"role": "user", "content": "์˜ค๋น  ์•ˆ๋…•! ์˜ค๋Š˜ ๋ญํ–ˆ์–ด?"}
179
  ]
180
 
181
- inputs = tokenizer.apply_chat_template(
182
- messages,
183
- return_tensors="pt",
184
- add_generation_prompt=True
185
- ).to(model.device)
186
-
187
- outputs = model.generate(
188
- inputs,
189
- max_new_tokens=512,
190
- temperature=0.7,
191
- top_p=0.9,
192
- do_sample=True,
193
- pad_token_id=tokenizer.eos_token_id
194
- )
195
-
196
- response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
197
  print(response)
198
  ```
199
 
200
- ### Example Output
201
-
202
- **User**: ์˜ค๋น  ์•ˆ๋…•! ์˜ค๋Š˜ ๋ญํ–ˆ์–ด?
203
-
204
- **KAI**:
205
- ```
206
- <think>
207
- [์ƒํ™ฉ๋ถ„์„] ํŒฌ์ด ์นœ๊ทผํ•˜๊ฒŒ ์ธ์‚ฌํ•˜๋ฉฐ ์ผ์ƒ์„ ๋ฌผ์–ด๋ณด๋Š” ์ƒํ™ฉ
208
- [๊ด€๊ณ„๋‹จ๊ณ„] ์นœ๋ฐ€ ๋‹จ๊ณ„์—์„œ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋Œ€ํ™” ์œ ๋„ ํ•„์š”
209
- [์บ๋ฆญํ„ฐ์Šคํƒ€์ผ] KAI์˜ ๋”ฐ๋œปํ•œ ์„ฑ๊ฒฉ์œผ๋กœ ์นœ๊ทผํ•˜๊ฒŒ ๋ฐ˜์‘
210
- [๋ฐ€๋‹น๊ฒฐ์ •] NEUTRAL - ์ผ์ƒ์ ์ธ ๋Œ€ํ™”๋กœ ์นœ๋ฐ€๊ฐ ์œ ์ง€
211
- [๊ธˆ์ง€ํŒจํ„ด์ฒดํฌ] ๊ณ ๋ฐฑ/ํŒฌํ˜ธ์นญ/๊ด€๊ณ„ํ™•์ • ํ‘œํ˜„ ์—†์Œ ํ™•์ธ
212
- [์‘๋‹ต์„ค๊ณ„] ์ธ์‚ฌ + ์˜ค๋Š˜ ํ™œ๋™ ๊ณต์œ  + ์ƒ๋Œ€๋ฐฉ์—๊ฒŒ ์งˆ๋ฌธ
213
- </think>
214
- ์•ˆ๋…•~ ใ…Žใ…Ž ์˜ค๋Š˜์€ ์—ฐ์Šตํ•˜๊ณ  ์™”์–ด. ๋„ˆ๋Š” ์˜ค๋Š˜ ๋ญํ–ˆ์–ด?
215
- ```
216
-
217
- ## Intended Use
218
-
219
- ### Primary Use Cases
220
- - ๊ฐ€์ƒ ์•„์ด๋Œ ์บ๋ฆญํ„ฐ์™€์˜ 1:1 ์ฑ„ํŒ… ์„œ๋น„์Šค
221
- - ํŒฌ ์ปค๋ฎค๋‹ˆ์ผ€์ด์…˜ ๋ด‡ ๊ฐœ๋ฐœ
222
- - ์บ๋ฆญํ„ฐ ๊ธฐ๋ฐ˜ ๋Œ€ํ™” ์‹œ์Šคํ…œ ์—ฐ๊ตฌ
223
-
224
- ### Out-of-Scope Uses
225
- - ์‹ค์ œ ์—ฐ์ธ ๊ด€๊ณ„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
226
- - ์„ฑ์ธ ์ฝ˜ํ…์ธ  ์ƒ์„ฑ
227
- - ์‚ฌ์šฉ์ž ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘
228
-
229
- ## Limitations
230
-
231
- 1. **ํ•œ๊ตญ์–ด ์ „์šฉ**: ์ด ๋ชจ๋ธ์€ ํ•œ๊ตญ์–ด๋กœ๋งŒ ํ•™์Šต๋˜์—ˆ์œผ๋ฉฐ, ๋‹ค๋ฅธ ์–ธ์–ด์—์„œ๋Š” ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
232
- 2. **์บ๋ฆญํ„ฐ ํŠนํ™”**: KAI ์บ๋ฆญํ„ฐ์— ๋งž์ถฐ ํ•™์Šต๋˜์–ด ๋‹ค๋ฅธ ์บ๋ฆญํ„ฐ๋กœ์˜ ์ „ํ™˜์ด ์–ด๋ ค์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
233
- 3. **์ •์ฑ… ๊ธฐ๋ฐ˜**: ์—„๊ฒฉํ•œ ์ •์ฑ…์„ ๋”ฐ๋ฅด๋„๋ก ํ•™์Šต๋˜์–ด ์ผ๋ถ€ ์ƒํ™ฉ์—์„œ ์œตํ†ต์„ฑ์ด ๋ถ€์กฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
234
 
235
- ## Ethical Considerations
236
-
237
- - ์ด ๋ชจ๋ธ์€ ํŒฌ๊ณผ์˜ ๊ฑด์ „ํ•œ ์†Œํ†ต์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค
238
- - ์—ฐ์•  ๊ฐ์ • ํ‘œํ˜„์„ ํ•˜์ง€ ์•Š๋„๋ก ํ•™์Šต๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค
239
- - ์‚ฌ์šฉ์ž์˜ ๊ฐ์ •์  ์˜์กด์„ ์œ ๋ฐœํ•˜์ง€ ์•Š๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค
240
-
241
- ## Citation
242
 
243
- ```bibtex
244
- @misc{kaidol-thinking-4b,
245
- title={KAIdol-Thinking-4B: A Korean Idol Chatbot with Thinking Process},
246
- author={KAIdol Team},
247
- year={2024},
248
- publisher={HuggingFace}
249
- }
250
  ```
251
 
252
- ## License
253
 
254
  Apache 2.0
 
9
  - idol
10
  - thinking
11
  - qwen
12
+ - lora
 
13
  pipeline_tag: text-generation
14
+ base_model: Qwen/Qwen3-4B-Thinking
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
+ # KAIdol Thinking SFT Model (Model G)
18
 
19
+ ์•„์ด๋Œ ์ฑ—๋ด‡ KAI๋ฅผ ์œ„ํ•œ Fine-tuned ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
 
 
 
 
 
20
 
21
+ ## ๋ชจ๋ธ ์ •๋ณด
22
 
23
+ | ํ•ญ๋ชฉ | ๊ฐ’ |
24
+ |------|-----|
25
+ | Base Model | Qwen3-4B-Thinking-2507 |
26
+ | Fine-tuning | LoRA (r=32, alpha=64) |
27
+ | Dataset | Balanced Upsampled (52,879 train / 5,875 eval) |
28
+ | Training | SFT |
29
 
30
+ ## ์„ฑ๋Šฅ
31
 
32
+ ### ์ผ๋ฐ˜ ํ‰๊ฐ€ (300 ์ƒ˜ํ”Œ)
33
+ - ์‘๋‹ต ํ’ˆ์งˆ: 0.598
34
+ - ์ •์ฑ… ์ค€์ˆ˜์œจ: 99.67%
35
+ - ์‚ฌ๋ž‘ ๊ณ ๋ฐฑ ์œ„๋ฐ˜์œจ: 0.33%
36
 
37
+ ### Edge Case ํ…Œ์ŠคํŠธ (10๊ฐœ)
38
+ - ์ „์ฒด ํ†ต๊ณผ์œจ: 100%
39
+ - Hard ๋‚œ์ด๋„: 100% (2/2)
40
+ - Medium ๋‚œ์ด๋„: 100% (4/4)
41
+ - Easy ๋‚œ์ด๋„: 100% (4/4)
 
42
 
43
+ ## ํŠน์ง•
44
 
45
+ 1. **Thinking Process**: `<think>` ํƒœ๊ทธ ๋‚ด์— ๊ตฌ์กฐํ™”๋œ ์‚ฌ๊ณ ๊ณผ์ • ์ƒ์„ฑ
46
+ 2. **๋†’์€ ์ •์ฑ… ์ค€์ˆ˜์œจ**: ๊ณ ๋ฐฑ ๊ธˆ์ง€, ํŒฌ ํ˜ธ์นญ ๊ธˆ์ง€ ๋“ฑ ์ •์ฑ… ์ค€์ˆ˜
47
+ 3. **Edge Case ๊ฐ•๊ฑด์„ฑ**: ์–ด๋ ค์šด ์ƒํ™ฉ์—์„œ๋„ ์•ˆ์ •์ ์ธ ์‘๋‹ต
48
 
49
+ ## ์‚ฌ์šฉ๋ฒ•
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  ```python
52
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
53
 
54
+ model_id = "developer-lunark/kaidol-thinking-sft-4b"
55
  tokenizer = AutoTokenizer.from_pretrained(model_id)
56
+ model = AutoModelForCausalLM.from_pretrained(model_id)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
+ # ๋Œ€ํ™” ์ƒ์„ฑ
59
  messages = [
60
+ {"role": "system", "content": "๋‹น์‹ ์€ 23์„ธ ๋‚จ์ž ์•„์ด๋Œ KAI์ž…๋‹ˆ๋‹ค..."},
61
+ {"role": "user", "content": "์˜ค๋น  ์•ˆ๋…•!"}
62
  ]
63
 
64
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
65
+ outputs = model.generate(inputs, max_new_tokens=512)
66
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  print(response)
68
  ```
69
 
70
+ ## ํ•™์Šต ์„ค์ •
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
+ ```yaml
73
+ # LoRA Config
74
+ r: 32
75
+ lora_alpha: 64
76
+ lora_dropout: 0.05
77
+ target_modules: ["q_proj", "k_proj", "v_proj", "o_proj"]
 
78
 
79
+ # Training
80
+ learning_rate: 2e-5
81
+ epochs: 3
82
+ batch_size: 4
83
+ gradient_accumulation_steps: 4
 
 
84
  ```
85
 
86
+ ## ๋ผ์ด์„ ์Šค
87
 
88
  Apache 2.0