|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- ko |
|
|
- en |
|
|
base_model: |
|
|
- klue/bert-base |
|
|
--- |
|
|
# LQ-KBERT-Base: Crypto Market Korean Sentiment & Action Signal Classifier |
|
|
|
|
|
๊ฐ์์์ฐ AI Agent ๋ฐ์ดํฐ ๋ถ์ ํ๋ซํผ, [LangQuant](https://langquant.com)์์ ๊ณต๊ฐํ **ํ๊ตญ์ด ๊ธ์ต ์ปค๋ฎค๋ํฐ/๋ด์ค ํฌ์์ฌ๋ฆฌ ๋ถ๋ฅ ๋ชจ๋ธ**์
๋๋ค. |
|
|
`klue/bert-base`๋ฅผ ๋ฐฑ๋ณธ์ผ๋ก ํ๊ณ , ๊ฐ์์์ฐ ๊ด๋ จ ํ๊ตญ์ด ๋ฐ์ดํฐ์
**10๋ง ๊ฑด ์ด์**์ ์ ์ฒ๋ฆฌํ์ฌ ํ์ธํ๋ํ์ต๋๋ค. |
|
|
๋ชจ๋ธ์ ๋ฌธ์ฅ ๋จ์ ์
๋ ฅ(`โค200์`)์ ๋ํด **ํฌ์ ์ฌ๋ฆฌยทํ๋ยท๊ฐ์ ยทํ์ ๋ยท๊ด๋ จ์ฑยท์ ํด์ฑ**์ ๋์์ ์์ธกํฉ๋๋ค. |
|
|
|
|
|
- [Github](https://github.com/LangQuant/LQ-KBERT-Base) |
|
|
--- |
|
|
### ๋ชจ๋ธ์ ์์ํ์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค. |
|
|
|
|
|
```json |
|
|
{ |
|
|
"sentiment_strength": "strong_pos | weak_pos | neutral | weak_neg | strong_neg", |
|
|
"action_signal": "buy | hold | sell | avoid | info_only | ask_info", |
|
|
"emotions": ["greed","fear","confidence","doubt","anger","hope","sarcasm"], |
|
|
"certainty": 0.0 ~ 1.0, |
|
|
"relevance": 0.0 ~ 1.0, |
|
|
"toxicity": 0.0 ~ 1.0 |
|
|
} |
|
|
``` |
|
|
--- |
|
|
## Labeling Guidelines |
|
|
|
|
|
### Sentiment Strength |
|
|
- **strong_pos**: ๊ธ๋ฑ ํ์ , `"๊ฐ์ฆ์"`, `"๋ฌด์กฐ๊ฑด ๊ฐ๋ค"`. |
|
|
- **weak_pos**: ์กฐ์ฌ์ค๋ฌ์ด ๋๊ด, `"๋ฐ๋ฑ ๊ฐ๋ฅ"`, `"๊ด์ฐฎ์ ๋ฏ"`. |
|
|
- **neutral**: ๋จ์ ์ ๋ณด/๊ณต์ง/์ก๋ด. |
|
|
- **weak_neg**: ์๊ณกํ ๋ถ์ , `"์กฐ์ ์ฌ ๋ฏ"`, `"๊ด๋ง"`. |
|
|
- **strong_neg**: ํญ๋ฝยทํจ๋, `"๋๋ฝ"`, `"๋งํจ"`, `"ํดํน/์ ์ฌ"`. |
|
|
|
|
|
### Action Signal |
|
|
- **buy**: ๋งค์/์ง์
์ง์, `"์ง๊ธ ์ฐ๋ค"`, `"๋กฑ"`. |
|
|
- **hold**: ๋ณด์ ์ ์ง/๊ด๋ง, `"์กด๋ฒ"`, `"์ ์ง"`. |
|
|
- **sell**: ๋งค๋/์ฒญ์ฐ, `"์ต์ "`, `"์์ "`, `"์ ๋ฆฌ"`. |
|
|
- **avoid**: ํํผ/์ํ ๊ฒฝ๊ณ , `"๊ฐ์ง๋ง"`, `"์ค์บ "`, `"์ํ"`. |
|
|
- **info_only**: ๋จ์ ์ ๋ณด ์ ๋ฌ (๋ด์ค/๊ณต์ง). |
|
|
- **ask_info**: ์ง๋ฌธ/ํ์, `"๋ค์ด๊ฐ๋ ๋ผ?"`, `"์ ๋จ์ด์ ธ?"`. |
|
|
|
|
|
### Emotions (๋ค์ค ์ ํ) |
|
|
- **greed** ํ์ |
|
|
- **fear** ๋๋ ค์ |
|
|
- **confidence** ํ์ |
|
|
- **doubt** ์์ฌ |
|
|
- **anger** ๋ถ๋
ธ |
|
|
- **hope** ํฌ๋ง |
|
|
- **sarcasm** ํ์ |
|
|
|
|
|
### Certainty |
|
|
- **0.2~0.4**: ์ง๋ฌธยทํ์ยท๋ฐ (๋ฎ์) |
|
|
- **0.4~0.6**: ์๊ณกํ ์๊ฒฌ (์ค๊ฐ) |
|
|
- **0.6~0.8**: ์์นยท๊ทผ๊ฑฐยท๊ณต์์ฑ (๋์) |
|
|
- **0.8~1.0**: ๊ฐํ ๋จ์ ยท์ง์ (๋งค์ฐ ๋์) |
|
|
|
|
|
### Relevance |
|
|
- **0.7~1.0**: ์ง์ ์ ์ธ ํฌ์/์์ฅ ๊ด๋ จ |
|
|
- **0.4~0.7**: ๊ฐ์ ๊ด๋ จ (์
๊ณ/์ธ๋ฌผ/๊ธฐ์ ) |
|
|
- **0.0~0.3**: ๋ฌด๊ด/์ก๋ด/๋ฐ |
|
|
|
|
|
### Toxicity |
|
|
- ์์คยท๋ชจ์ยท๋นํ ๊ฐ๋์ ๋ฐ๋ผ **0~1**. |
|
|
- ํฌ์ ์๋ฏธ์๋ ๋ณ๋๋ก ๋
๋ฆฝ์ ์ผ๋ก ํ๊ฐ. |
|
|
|
|
|
--- |
|
|
|
|
|
## Sentiment Strength vs Action Signal |
|
|
|
|
|
- **Sentiment Strength** |
|
|
- ํฌ์ ์ฌ๋ฆฌ์ ๊ฐ๋ (๊ธ์ โ ๋ถ์ ). |
|
|
- ๊ฐ๊ฒฉ ์ ๋ง์ ํค์ ์ง์ค. |
|
|
|
|
|
- **Action Signal** |
|
|
- ์ค์ ํฌ์ ํ๋ ์๋/์ง์. |
|
|
- ๋งค์/๋งค๋/๋ณด์ /ํํผ/์ง๋ฌธ/์ ๋ณด. |
|
|
|
|
|
|
|
|
--- |
|
|
## How to use the model |
|
|
``` |
|
|
import torch, json |
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
|
|
repo_or_dir = "LangQuant/LQ-Kbert-base" |
|
|
texts = [ |
|
|
"๋นํธ์ฝ์ธ ์กฐ์ ํ ๋ฐ๋ฑ, ํฌ์์ฌ๋ฆฌ ๊ฐ์ ", |
|
|
"ํ์จ ๊ธ๋ฑ์ ์ฆ์ ๋ณ๋์ฑ ํ๋", |
|
|
"๋นํธ ๊ทธ๋ง ์ข ๋ด๋ ค๋ผ ์ง์ง..", |
|
|
"ํญ๋ฝใ
ใ
ใ
ใ
ใ
๋ค ํ์์ผํ ๊น์?" |
|
|
] |
|
|
|
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(repo_or_dir) |
|
|
model = AutoModel.from_pretrained(repo_or_dir, trust_remote_code=True) |
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
model.to(device).eval() |
|
|
|
|
|
|
|
|
enc = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=200).to(device) |
|
|
with torch.inference_mode(): |
|
|
out = model(**enc) |
|
|
|
|
|
IDX2SENTI = {0:"strong_pos",1:"weak_pos",2:"neutral",3:"weak_neg",4:"strong_neg"} |
|
|
IDX2ACT = {0:"buy",1:"hold",2:"sell",3:"avoid",4:"info_only",5:"ask_info"} |
|
|
EMO_LIST = ["greed","fear","confidence","doubt","anger","hope","sarcasm"] |
|
|
|
|
|
|
|
|
for i, t in enumerate(texts): |
|
|
senti = int(out["logits_senti"][i].argmax().item()) |
|
|
act = int(out["logits_act"][i].argmax().item()) |
|
|
emo_p = torch.sigmoid(out["logits_emo"][i]).tolist() |
|
|
reg = torch.clamp(out["pred_reg"][i], 0, 1).tolist() |
|
|
emos = [EMO_LIST[j] for j,p in enumerate(emo_p) if p >= 0.5] |
|
|
|
|
|
result = { |
|
|
"text": t, |
|
|
"pred_sentiment_strength": IDX2SENTI[senti], |
|
|
"pred_action_signal": IDX2ACT[act], |
|
|
"pred_emotions": emos, |
|
|
"pred_certainty": float(reg[0]), |
|
|
"pred_relevance": float(reg[1]), |
|
|
"pred_toxicity": float(reg[2]), |
|
|
} |
|
|
print(json.dumps(result, ensure_ascii=False)) |
|
|
|
|
|
``` |
|
|
--- |
|
|
|
|
|
### Examples |
|
|
|
|
|
| ๋ฌธ์ฅ | sentiment_strength | action_signal | ํด์ | |
|
|
|------|--------------------|---------------|------| |
|
|
| "๊ฐ๋ก์์ด์ฌ " | strong_pos | buy | ๊ฐํ ์์น ํ์ + ์ฆ์ ๋งค์ ์๋ | |
|
|
| "์ฌ๊ธฐ์ ๊ด๋ง์ด ๋ง๋ค" | weak_neg | hold | ๋ถ์ ์ ์ด์ง๋ง ๋ณด์ ์ ์ง ์ ํ | |
|
|
| "๋ค์ด๊ฐ๋ ๋ ๊น?" | weak_pos | ask_info | ์กฐ์ฌ์ค๋ฌ์ด ๋๊ด, ๋งค์ ํ์ ์ง๋ฌธ | |
|
|
| "ํดํน ํฐ์ง, ๋น์. ์ ๊ทผ ๊ธ์ง" | strong_neg | avoid | ๊ฐํ ๋ถ์ + ํํผ ๊ถ๊ณ | |
|
|
| "์
๋ฐ์ดํธ ๊ณต์ง ๋์์ต๋๋ค" | neutral | info_only | ๋จ์ ์ ๋ณด ์ ๊ณต, ํ๋ ์์ | |
|
|
|
|
|
--- |
|
|
### Citation |
|
|
``` |
|
|
@misc{langquant2025lkbert, |
|
|
title = {LQ-KBERT-Base: Crypto Market Korean Sentiment & Action Signal Classifier}, |
|
|
author = {LangQuant}, |
|
|
year = {2025}, |
|
|
url = {https://huggingface.co/langquant/LQ-Kbert-base} |
|
|
} |
|
|
``` |
|
|
--- |
|
|
### Disclaimer |
|
|
``` |
|
|
์ด ๋ชจ๋ธ์ ํ์ ์ฐ๊ตฌ ๋ฐ ์คํ์ฉ์ผ๋ก๋ง ์ ๊ณต๋ฉ๋๋ค. |
|
|
๋ณธ ๋ชจ๋ธ์ ์ถ๋ ฅ์ ๊ธ์ต/ํฌ์ ์๋ฌธ์ผ๋ก ๊ฐ์ฃผ๋ ์ ์์ผ๋ฉฐ, |
|
|
๋ฐ์ํ๋ ๋ชจ๋ ๊ฒฐ๊ณผ์ ๋ํด LangQuant๋ ์ฑ
์์ ์ง์ง ์์ต๋๋ค. |
|
|
``` |