---
language:
- ko
base_model:
- naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B
pipeline_tag: text-generation
library_name: transformers
license: other
new_version: haebo/meow-clovax-v3
datasets:
- haebo/meow-v1-dataset
---
# πΎ meow-clovax-v1
> **meow-clovax-v1μ κ°μ (emotion)κ³Ό λλ¬Ό μ ν(post_type)μ λ°λΌ λ¬Έμ₯μ μμ°μ€λ½κ² λ³ννλ νκ΅μ΄ LLMμ
λλ€.**
- nick_name : haebo/Meow-HyperCLOVAX-1.5B_FullFT_fp32_0615i
- λ³Έ λͺ¨λΈμ `naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B`λ₯Ό κΈ°λ°μΌλ‘ Supervised Finetuning(SFT) λ°©μμΌλ‘ νμ΅λμμ΅λλ€.
---
## π§ Model Details
| νλͺ© | μ€λͺ
|
|------|------|
| **Base Model** | HyperCLOVAX-SEED-Text-Instruct-1.5B |
| **Fine-tuning Method** | Supervised Finetuning (SFT) |
| **Model Type** | Decoder-only |
| **Language** | Korean (primary) |
| **Parameters** | 1.5B |
| **Precision** | fp16 / fp32 |
| **Version** | v1 |
| **Framework** | Transformers |
| **license** | hyperclovax-seed |
---
## π¦ Training Details
- **Dataset**: κ°μ λ° λλ¬Ό λ§ν¬μ λ°λΌ μμ§Β·ν©μ±λ style transfer λ°μ΄ν°μ
(λΉκ³΅κ°)
- κ° μνμ `content`, `emotion`, `post_type`, `transformed_content` νλλ‘ κ΅¬μ±λ jsonl λ°μ΄ν°μ
- **Task**: Instruct-style fine-tuning (prompt β transformed response)
- **Prompt ꡬ쑰**:
- instruction:"λ€μ λ¬Έμ₯μ [λλ¬Ό]μ [κ°μ ]ν λ§ν¬λ‘ λ°κΏμ€.\nInput: ...\nOutput:"
- **Epochs**: 3
- **Training Infrastructure**: Google Colab Pro+ (A100)
- **Instruction Infrastructure**: Google Colab Pro+ (T4) / GCP T4
---
## π‘ Intended Use
- κ°μ λ° λλ¬Ό λ§ν¬ μ€νμΌ λ³ν
- μΊλ¦ν° μ±λ΄, κ°μ νν μ±λ΄ λ±
## β οΈ Limitations & Bias
- κ°μ λ° λλ¬Ό μ νμ λ°λΌ λ³νμ΄ λΆμμ°μ€λ¬μΈ μ μμ
- λ°μ΄ν°μ
μ νκ³λ‘ νΉμ κ°μ /λλ¬Ό μ νμ νΈν₯μ΄ μμ μ μμ
- λΆμ μ ν μ
λ ₯μ λν΄ μμμΉ λͺ»ν μΆλ ₯μ μμ±ν μ μμ
- μΆλ ₯μ λΆμ μ ν μμκ° λ§μ΄ ν¬ν¨λμ΄ μμ΄ νμ²λ¦¬λ₯Ό μ§νν κ²°κ³Όλ₯Ό μ¬μ©
## π How to Use
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "haebo/meow-clovax-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
content = "μ§μ¦λ¬κ² λ€ λλ μμΉ¨λ§λ€ μ§μ¦λ¨"
emotion = "angry"
post_type = "cat"
instruction = f"λ€μ λ¬Έμ₯μ {post_type}μ {emotion}ν λ§ν¬λ‘ λ°κΏμ€."
prompt = (
f"### Instruction:\n{example['instruction']}\n"
f"### Input:\n{example['input']}\n"
f"### Output:\n{example['output']}"
)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=400)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## ποΈ Dataset
> v1 λͺ¨λΈμλ μλμ κ°μ λ°μ΄ν°μ
μ΄ μ¬μ©λμμ΅λλ€.μ΄ λ°μ΄ν°λ€μ λ³λμ μ μ²λ¦¬(ν΄λμ§/νν°λ§) μμ΄ μλ³Έ κ·Έλλ‘ νμ©λμμ΅λλ€.
> νμΈνλ μ ν둬ννΈ κ΅¬μ‘°μ λ§κ² λ³κ²½λμμ΅λλ€.
- **λ°μ΄ν° ꡬ쑰**
κ° μνμ μλμ κ°μ νλλ‘ κ΅¬μ±λμ΄ μμ΅λλ€.
- `content`: μλ³Έ λ¬Έμ₯ (μΌμ νκ΅μ΄)
- `emotion`: κ°μ λ μ΄λΈ (μ: happy, sad, angry λ±)
- `post_type`: λλ¬Ό μ ν (μ: cat, dog)
- `transformed_content`: κ°μ λ° λλ¬Ό λ§ν¬λ‘ λ³νλ λ¬Έμ₯
- **μμ**
```json
{
"content": "μ€λ μ μ¬ λ λ¨Ήμ§.",
"emotion": "normal",
"post_type": dog",
"transformed_content": "μ€λ μ μ¬ λ λ¨Ήμ§λ©? πΎ λ§μλ λμκ° λλ κ² κ°λ€λ©! μ£ΌμΈλ, μ λ°₯ μ΄λ¨λμ! 빨리 λ°₯κ·Έλ¦ μ±μλ¬λΌλ©! 𦴠αβΒ΄ κ³ `βα"
}
```
- **λ°μ΄ν°μ
(μ΄ 4,827κ°)**
- **dataset_0515_made (342κ°)**: μ΄κΈ° μ μ λ°μ΄ν°
- **dataset_0527_made (818κ°)**: μ μ κ²μκΈ κΈ°λ° κ°μ λ³/λλ¬Όλ³ λ°μ΄ν°
- **dataset_0530_made (2,986κ°)**: κ°μ λ³ μ¦νλ κ²μκΈ κΈ°λ° λ°μ΄ν°
- **dataset_0613_made (681κ°)**: μ μ λκΈ μ
λ ₯μ λν κ·μΉ κΈ°λ° λ³ν(cat)