LQ-Kbert-base / README.md
taegyeonglee's picture
Update README.md
75353d5 verified
---
license: mit
language:
- ko
- en
base_model:
- klue/bert-base
---
# LQ-KBERT-Base: Crypto Market Korean Sentiment & Action Signal Classifier
๊ฐ€์ƒ์ž์‚ฐ AI Agent ๋ฐ์ดํ„ฐ ๋ถ„์„ ํ”Œ๋žซํผ, [LangQuant](https://langquant.com)์—์„œ ๊ณต๊ฐœํ•œ **ํ•œ๊ตญ์–ด ๊ธˆ์œต ์ปค๋ฎค๋‹ˆํ‹ฐ/๋‰ด์Šค ํˆฌ์ž์‹ฌ๋ฆฌ ๋ถ„๋ฅ˜ ๋ชจ๋ธ**์ž…๋‹ˆ๋‹ค.
`klue/bert-base`๋ฅผ ๋ฐฑ๋ณธ์œผ๋กœ ํ•˜๊ณ , ๊ฐ€์ƒ์ž์‚ฐ ๊ด€๋ จ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹ **10๋งŒ ๊ฑด ์ด์ƒ**์„ ์ „์ฒ˜๋ฆฌํ•˜์—ฌ ํŒŒ์ธํŠœ๋‹ํ–ˆ์Šต๋‹ˆ๋‹ค.
๋ชจ๋ธ์€ ๋ฌธ์žฅ ๋‹จ์œ„ ์ž…๋ ฅ(`โ‰ค200์ž`)์— ๋Œ€ํ•ด **ํˆฌ์ž ์‹ฌ๋ฆฌยทํ–‰๋™ยท๊ฐ์ •ยทํ™•์‹ ๋„ยท๊ด€๋ จ์„ฑยท์œ ํ•ด์„ฑ**์„ ๋™์‹œ์— ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
- [Github](https://github.com/LangQuant/LQ-KBERT-Base)
---
### ๋ชจ๋ธ์€ ์•„์›ƒํ’‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
```json
{
"sentiment_strength": "strong_pos | weak_pos | neutral | weak_neg | strong_neg",
"action_signal": "buy | hold | sell | avoid | info_only | ask_info",
"emotions": ["greed","fear","confidence","doubt","anger","hope","sarcasm"],
"certainty": 0.0 ~ 1.0,
"relevance": 0.0 ~ 1.0,
"toxicity": 0.0 ~ 1.0
}
```
---
## Labeling Guidelines
### Sentiment Strength
- **strong_pos**: ๊ธ‰๋“ฑ ํ™•์‹ , `"๊ฐ€์ฆˆ์•„"`, `"๋ฌด์กฐ๊ฑด ๊ฐ„๋‹ค"`.
- **weak_pos**: ์กฐ์‹ฌ์Šค๋Ÿฌ์šด ๋‚™๊ด€, `"๋ฐ˜๋“ฑ ๊ฐ€๋Šฅ"`, `"๊ดœ์ฐฎ์„ ๋“ฏ"`.
- **neutral**: ๋‹จ์ˆœ ์ •๋ณด/๊ณต์ง€/์žก๋‹ด.
- **weak_neg**: ์™„๊ณกํ•œ ๋ถ€์ •, `"์กฐ์ • ์˜ฌ ๋“ฏ"`, `"๊ด€๋ง"`.
- **strong_neg**: ํญ๋ฝยทํŒจ๋‹‰, `"๋‚˜๋ฝ"`, `"๋งํ•จ"`, `"ํ•ดํ‚น/์ œ์žฌ"`.
### Action Signal
- **buy**: ๋งค์ˆ˜/์ง„์ž… ์ง€์‹œ, `"์ง€๊ธˆ ์‚ฐ๋‹ค"`, `"๋กฑ"`.
- **hold**: ๋ณด์œ  ์œ ์ง€/๊ด€๋ง, `"์กด๋ฒ„"`, `"์œ ์ง€"`.
- **sell**: ๋งค๋„/์ฒญ์‚ฐ, `"์ต์ ˆ"`, `"์†์ ˆ"`, `"์ •๋ฆฌ"`.
- **avoid**: ํšŒํ”ผ/์œ„ํ—˜ ๊ฒฝ๊ณ , `"๊ฐ€์ง€๋งˆ"`, `"์Šค์บ "`, `"์œ„ํ—˜"`.
- **info_only**: ๋‹จ์ˆœ ์ •๋ณด ์ „๋‹ฌ (๋‰ด์Šค/๊ณต์ง€).
- **ask_info**: ์งˆ๋ฌธ/ํƒ์ƒ‰, `"๋“ค์–ด๊ฐ€๋„ ๋ผ?"`, `"์™œ ๋–จ์–ด์ ธ?"`.
### Emotions (๋‹ค์ค‘ ์„ ํƒ)
- **greed** ํƒ์š•
- **fear** ๋‘๋ ค์›€
- **confidence** ํ™•์‹ 
- **doubt** ์˜์‹ฌ
- **anger** ๋ถ„๋…ธ
- **hope** ํฌ๋ง
- **sarcasm** ํ’์ž
### Certainty
- **0.2~0.4**: ์งˆ๋ฌธยทํƒ์ƒ‰ยท๋ฐˆ (๋‚ฎ์Œ)
- **0.4~0.6**: ์™„๊ณกํ•œ ์˜๊ฒฌ (์ค‘๊ฐ„)
- **0.6~0.8**: ์ˆ˜์น˜ยท๊ทผ๊ฑฐยท๊ณต์‹์„ฑ (๋†’์Œ)
- **0.8~1.0**: ๊ฐ•ํ•œ ๋‹จ์ •ยท์ง€์‹œ (๋งค์šฐ ๋†’์Œ)
### Relevance
- **0.7~1.0**: ์ง์ ‘์ ์ธ ํˆฌ์ž/์‹œ์žฅ ๊ด€๋ จ
- **0.4~0.7**: ๊ฐ„์ ‘ ๊ด€๋ จ (์—…๊ณ„/์ธ๋ฌผ/๊ธฐ์ˆ )
- **0.0~0.3**: ๋ฌด๊ด€/์žก๋‹ด/๋ฐˆ
### Toxicity
- ์š•์„คยท๋ชจ์š•ยท๋น„ํ•˜ ๊ฐ•๋„์— ๋”ฐ๋ผ **0~1**.
- ํˆฌ์ž ์˜๋ฏธ์™€๋Š” ๋ณ„๋„๋กœ ๋…๋ฆฝ์ ์œผ๋กœ ํ‰๊ฐ€.
---
## Sentiment Strength vs Action Signal
- **Sentiment Strength**
- ํˆฌ์ž ์‹ฌ๋ฆฌ์˜ ๊ฐ•๋„ (๊ธ์ • โ†” ๋ถ€์ •).
- ๊ฐ€๊ฒฉ ์ „๋ง์˜ ํ†ค์— ์ง‘์ค‘.
- **Action Signal**
- ์‹ค์ œ ํˆฌ์ž ํ–‰๋™ ์˜๋„/์ง€์‹œ.
- ๋งค์ˆ˜/๋งค๋„/๋ณด์œ /ํšŒํ”ผ/์งˆ๋ฌธ/์ •๋ณด.
---
## How to use the model
```
import torch, json
from transformers import AutoTokenizer, AutoModel
repo_or_dir = "LangQuant/LQ-Kbert-base"
texts = [
"๋น„ํŠธ์ฝ”์ธ ์กฐ์ • ํ›„ ๋ฐ˜๋“ฑ, ํˆฌ์ž์‹ฌ๋ฆฌ ๊ฐœ์„ ",
"ํ™˜์œจ ๊ธ‰๋“ฑ์— ์ฆ์‹œ ๋ณ€๋™์„ฑ ํ™•๋Œ€",
"๋น„ํŠธ ๊ทธ๋งŒ ์ข€ ๋‚ด๋ ค๋ผ ์ง„์งœ..",
"ํญ๋ฝใ… ใ… ใ…œใ… ใ…œ ๋‹ค ํŒ”์•„์•ผํ• ๊นŒ์š”?"
]
tokenizer = AutoTokenizer.from_pretrained(repo_or_dir)
model = AutoModel.from_pretrained(repo_or_dir, trust_remote_code=True)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device).eval()
enc = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=200).to(device)
with torch.inference_mode():
out = model(**enc)
IDX2SENTI = {0:"strong_pos",1:"weak_pos",2:"neutral",3:"weak_neg",4:"strong_neg"}
IDX2ACT = {0:"buy",1:"hold",2:"sell",3:"avoid",4:"info_only",5:"ask_info"}
EMO_LIST = ["greed","fear","confidence","doubt","anger","hope","sarcasm"]
for i, t in enumerate(texts):
senti = int(out["logits_senti"][i].argmax().item())
act = int(out["logits_act"][i].argmax().item())
emo_p = torch.sigmoid(out["logits_emo"][i]).tolist()
reg = torch.clamp(out["pred_reg"][i], 0, 1).tolist()
emos = [EMO_LIST[j] for j,p in enumerate(emo_p) if p >= 0.5]
result = {
"text": t,
"pred_sentiment_strength": IDX2SENTI[senti],
"pred_action_signal": IDX2ACT[act],
"pred_emotions": emos,
"pred_certainty": float(reg[0]),
"pred_relevance": float(reg[1]),
"pred_toxicity": float(reg[2]),
}
print(json.dumps(result, ensure_ascii=False))
```
---
### Examples
| ๋ฌธ์žฅ | sentiment_strength | action_signal | ํ•ด์„ |
|------|--------------------|---------------|------|
| "๊ฐœ๋–ก์ƒ์ด์—ฌ " | strong_pos | buy | ๊ฐ•ํ•œ ์ƒ์Šน ํ™•์‹  + ์ฆ‰์‹œ ๋งค์ˆ˜ ์˜๋„ |
| "์—ฌ๊ธฐ์„  ๊ด€๋ง์ด ๋งž๋‹ค" | weak_neg | hold | ๋ถ€์ •์ ์ด์ง€๋งŒ ๋ณด์œ  ์œ ์ง€ ์„ ํƒ |
| "๋“ค์–ด๊ฐ€๋„ ๋ ๊นŒ?" | weak_pos | ask_info | ์กฐ์‹ฌ์Šค๋Ÿฌ์šด ๋‚™๊ด€, ๋งค์ˆ˜ ํƒ์ƒ‰ ์งˆ๋ฌธ |
| "ํ•ดํ‚น ํ„ฐ์ง, ๋น„์ƒ. ์ ‘๊ทผ ๊ธˆ์ง€" | strong_neg | avoid | ๊ฐ•ํ•œ ๋ถ€์ • + ํšŒํ”ผ ๊ถŒ๊ณ  |
| "์—…๋ฐ์ดํŠธ ๊ณต์ง€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค" | neutral | info_only | ๋‹จ์ˆœ ์ •๋ณด ์ œ๊ณต, ํ–‰๋™ ์—†์Œ |
---
### Citation
```
@misc{langquant2025lkbert,
title = {LQ-KBERT-Base: Crypto Market Korean Sentiment & Action Signal Classifier},
author = {LangQuant},
year = {2025},
url = {https://huggingface.co/langquant/LQ-Kbert-base}
}
```
---
### Disclaimer
```
์ด ๋ชจ๋ธ์€ ํ•™์ˆ  ์—ฐ๊ตฌ ๋ฐ ์‹คํ—˜์šฉ์œผ๋กœ๋งŒ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.
๋ณธ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์€ ๊ธˆ์œต/ํˆฌ์ž ์ž๋ฌธ์œผ๋กœ ๊ฐ„์ฃผ๋  ์ˆ˜ ์—†์œผ๋ฉฐ,
๋ฐœ์ƒํ•˜๋Š” ๋ชจ๋“  ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด LangQuant๋Š” ์ฑ…์ž„์„ ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
```