---
language:
- ko
base_model:
- naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B
pipeline_tag: text-generation
library_name: transformers
license: other
metrics:
- bertscore
- bleu
- perplexity
new_version: haebo/meow-clovax-v3
datasets:
- haebo/meow-v2-dataset
---
# πΎ Meow-CLOVAX-v2
**Meow-CLOVAX-v2**λ κ°μ λ° λλ¬Ό λ§ν¬ μ€νμΌ λ³νμ λͺ©νλ‘ κ°λ°λ νκ΅μ΄ μ€μ¬μ κ²½λ LLMμ
λλ€.
μ¬μ©μμ ν
μ€νΈλ₯Ό λ€μν κ°μ λ° λλ¬Ό(κ³ μμ΄/κ°μμ§) λ§ν¬λ‘ μμ°μ€λ½κ² λ³ννλ SNSν μ€νμΌ λ³ν μμ€ν
μ μν΄ νμΈνλλ λͺ¨λΈμ
λλ€.
> π§ͺ λ³Έ λͺ¨λΈμ `naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B`λ₯Ό κΈ°λ°μΌλ‘ Supervised Finetuning(SFT) λ°©μμΌλ‘ νμ΅λμμ΅λλ€.
> nick_name : haebo/Meow-HyperCLOVAX-1.5B_SFT-FFT_fp32_0629cfe
---
## π§ Model Details
| νλͺ© | μ€λͺ
|
|------|------|
| **Base Model** | HyperCLOVAX-SEED-Text-Instruct-1.5B |
| **Fine-tuning Method** | Supervised Finetuning (SFT) |
| **Model Type** | Decoder-only |
| **Language** | Korean (primary) |
| **Parameters** | 1.5B |
| **Precision** | fp16 / fp32 |
| **Version** | v2 |
| **Framework** | Transformers |
| **license** | hyperclovax-seed |
---
## π¦ Training Details
- **Dataset**: κ°μ λ° λλ¬Ό λ§ν¬μ λ°λΌ μμ§Β·ν©μ±λ style transfer λ°μ΄ν°μ
(λΉκ³΅κ°)
- κ° μνμ `content`, `emotion`, `post_type`, `transformed_content` νλλ‘ κ΅¬μ±λ jsonl λ°μ΄ν°μ
- **Task**: Instruct-style fine-tuning (prompt β transformed response)
- **Prompt ꡬ쑰**:
- system: "λλ λλ¬Ό μ νκ³Ό κ°μ μ λ§κ² λ¬Έμ₯μ μμ°μ€λ½κ² λ³ννλ μ λ¬Έκ°μΌ."
- user: "λ€μ λ¬Έμ₯μ [κ°μ ] [λλ¬Ό] λ§ν¬λ‘ λ°κΏμ€.\nInput: ...\nOutput:"
- assistant: λ³νλ¬Έ + EOS
- **Epochs**: 3
- **Evaluation**: BLEU, KoBERTScore, Perplexity, Quality Score, Type Score, μλ νκ° λ± μ¬μ©
- **Training Infrastructure**: Google Colab Pro+ (A100)
- **Instruction Infrastructure**: Google Colab Pro+ (T4) / GCP T4
---
## π― Intended Use
- κ°μ κΈ°λ° λ§ν¬ λ³ν μλΉμ€ (μ: κ³ μμ΄ λ§ν¬ + νλ¨ β βμ 건λλμΉ! μ 건λλ ΈμΌλ©΄ μ’κ² λ€μΉ!β)
- μ¬μ©μ κΈμ μλ―Έλ₯Ό μ΅λν μ μ§νλ λ°©ν₯μΌλ‘ λ³ν
- SNS μΊλ¦ν° 보μ , λκΈ μλ μλ΅, κ°μ κΈ°λ° μ±λ΄ λ±μ νμ© κ°λ₯
- μ¬μ©μ ν둬ννΈ μ€νμΌ λ³κ²½ or ν€ μ‘°μ λ±μ νμ©
---
## β οΈ Limitations
- μ¬μ€ κΈ°λ° μμ±λ³΄λ€λ λ§ν¬ μ€νμΌλ§μ μ΄μ μ λ§μΆ€
- λΆμ ννκ±°λ λΉλ
Όλ¦¬μ μΈ λ¬Έμ₯μ μμ±ν μ μμ
- μ€μ κ°μ μν λΆμμ μννμ§ μμ
- μ μ¬λμ§ μμ μ
λ ₯(λΉλ¬Έ, λ§ν¬, μ΄λͺ¨μ§)μ λν΄ μ΄μν λ¬Έμ₯μ μμ±ν μ μμ
---
## π οΈ How to Use
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "haebo/meow-clovax-v3"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
prompt = (
"<|system|>\n"
"λλ λλ¬Ό μ νκ³Ό κ°μ μ λ§κ² λ¬Έμ₯μ μμ°μ€λ½κ² λ³ννλ μ λ¬Έκ°μΌ.\n"
"<|user|>\n"
"λ€μ λ¬Έμ₯μ happy cat λ§ν¬λ‘ λ°κΏμ€.\n"
"Input: μ€λμ μ λ§ μ’μ ν루μμ΄!\n"
"Output:\n"
"<|assistant|>\n"
)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=400)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## π§ͺ νκ° κΈ°μ€ λ° μλν
- **BLEU Score**: n-gram κΈ°λ° νλ©΄ μ μ¬λ (0~1, λμμλ‘ μ μ¬)
- **KoBERTScore**: μλ―Έμ μ μ¬λ(BERT μλ² λ©, 0.8β μλ―Έ μ μ¬)
- **Perplexity**: μΈμ΄λͺ¨λΈ μμ°μ€λ¬μ(60~180 κ΅¬κ° 1.0μ )
- **Quality Score**: κΈμ§μ΄, λ°λ³΅, νμ© λ¬Έμ, μ΄λͺ¨μ§ λ± μλΉμ€ νμ§
- **Type Score**: λͺ©ν λλ¬Ό λ§ν¬ ν¨ν΄ μΌμΉ(1.0: μλ²½, 0.2: νΌν©, 0.1: λ°λ, 0: μμ)
- **λ°μ΄ν° ν΄λμ§**: νκΈ/μλ¬Έ/μ«μ/μ£Όμꡬλμ /μ΄λͺ¨μ§λ§ νμ©, URLΒ·λΆμ©λ¬ΈμΒ·λ€μ€κ³΅λ°±Β·κ³Όλλ°λ³΅ μ κ±°
---
## ποΈ λ°μ΄ν°μ
μ€λͺ
> λ³Έ νλ‘μ νΈμ λ°μ΄ν°μ
μ κ°μ λ° λλ¬Ό λ§ν¬ μ€νμΌ λ³νμ μν΄ μ€κ³λ νκ΅μ΄ jsonl ν¬λ§·μ
λλ€. κ° μνμ λ€μκ³Ό κ°μ 4κ° νλλ‘ κ΅¬μ±λμ΄ μμ΅λλ€.
| νλλͺ
| μ€λͺ
| μμ κ° |
|----------------------|--------------------------------------|----------------|
| `content` | μλ³Έ λ¬Έμ₯ (μΌμ νκ΅μ΄) | μ€λμ μ λ§ μ’μ ν루μμ΄. |
| `emotion` | κ°μ λ μ΄λΈ (μλ¬Έ) | happy |
| `post_type` | λλ¬Ό μ ν (μλ¬Έ) | cat |
| `transformed_content`| κ°μ λ° λλ¬Ό λ§ν¬λ‘ λ³νλ λ¬Έμ₯ | μ€λμ μ λ§ μ’μ ν루μλ€λ₯! πΈ |
- **μ¬μ©λ λ°μ΄ν°μ
**
> λ³Έ λͺ¨λΈμ νμΈνλμλ μ¬λ¬ λ¨κ³μ λ°μ΄ν°μ
μ ν΅ν©Β·μ μ ν **μ΅μ’
λ°μ΄ν°μ
**μ΄ μ¬μ©λμμ΅λλ€.
> λ³Έ λ°μ΄ν°μ
μ μ€μ μ μ /λκΈ/ν©μ± λ°μ΄ν°λ₯Ό ν΅ν©Β·μ μ νμ¬, λ€μν κ°μ κ³Ό λλ¬Ό λ§ν¬ λ³νμ μ΅μ νλμ΄ μμ΅λλ€.
- **μ΄ μν μ**: 15,698κ° (μλ¬Έ: 2,038κ°)
- **ν¬ν¨ λ°μ΄ν°**:
- **dataset_0515_made** (342κ°): μ΄κΈ° μ μ λ°μ΄ν°
- **dataset_0527_made** (818κ°): μ μ κ²μκΈ κΈ°λ° κ°μ λ³/λλ¬Όλ³ λ°μ΄ν°
- **dataset_0530_made** (2,986κ°): κ°μ λ³ μ¦νλ κ²μκΈ κΈ°λ° λ°μ΄ν°
- **dataset_0613_made** (681κ°): μ μ λκΈ μ
λ ₯μ λν κ·μΉ κΈ°λ° λ³ν(cat)
- **dataset_0620_made** (681κ°): μ μ λκΈ μ
λ ₯μ λν κ·μΉ κΈ°λ° λ³ν(dog)
- **dataset_0622_made** (17,596κ°): Geminiλ‘ μμ±λ ν©μ± μΈν λ§ν¬ λ³ν
- **μ£Όλ ꡬμ±**: μ μ λ°μ΄ν°, λκΈ λ°μ΄ν°, ν©μ± λ°μ΄ν° λ¬Έμ₯ λ± λ€μν μ ν ν¬ν¨
- **μ μ²λ¦¬ μ¬ν**:
- μ€λ³΅ μ κ±°
- νν°λ§ λ°μ
- **κ°μ λ²μ**: normal, happy, sad, grumpy, curious, angry (6μ’
)
- **λλ¬Ό μ ν**: cat, dog (2μ’
)
- **λ°μ΄ν° ꡬ쑰**:
- κ° μνμ `content`, `emotion`, `post_type`, `transformed_content` νλλ‘ κ΅¬μ±λ JSONL ν¬λ§·
- **νΉμ§**:
- λ€μν κΈΈμ΄/μ νμ λ¬Έμ₯ ν¬ν¨
- content/emotion/post_type/transformed_contentμ 4νλλ‘ κ΅¬μ±
- Jsonl ννμ λ°μ΄ν°
- **μμ**
```json
{
"content": "λ΄ μκΈ°λ λ£μ§λ μκ³ μκΈ° λ§λ§ κ³μνλ€. λ무 νλμ κ·Έλ₯ λμλ²λ Έλ€.",
"emotion": "angry",
"post_type": "dog",
"transformed_content": "ν₯! λ΄ μλ―Έμ΄ λ
μ, λ μκΈ° λ§λ§ νλκ±°λμ! πΎ λ무 νκ° λμ κ·Έλ₯ λμλ€ λ©!π₯"
}
```