kbsooo/mjuclaw-intent-dataset
Viewer โข Updated โข 4.47k โข 24
๋ช
์ง๋ํ๊ต Discord ๋ด mjuclaw์ ์๋ ๋ถ๋ฅ(intent classification) ๋ชจ๋ธ.
์ฌ์ฉ์ ํ๊ตญ์ด ์ฟผ๋ฆฌ๋ฅผ 15๊ฐ ์ธํ
ํธ๋ก ๋ถ๋ฅํด ์ ์ ํ CLI ๋ช
๋ น(mju-cli/mju-news)์ผ๋ก ๋ผ์ฐํ
ํ๊ฑฐ๋, ์ก๋ดยท์
์ฉ ์์ฒญ์ ๊ตฌ๋ถํ๋ค.
beomi/KcELECTRA-base (110M)| id | class | route |
|---|---|---|
| 0 | service.lms.unsubmitted |
๋ฏธ์ ์ถ ๊ณผ์ ์กฐํ |
| 1 | service.lms.due_assignments |
๋ง๊ฐ ์๋ฐ ๊ณผ์ |
| 2 | service.lms.unread_notices |
์ ์ฝ์ ๊ฐ์์ค ๊ณต์ง |
| 3 | service.lms.incomplete_online |
๋ฏธ์์ฒญ ์จ๋ผ์ธ ๊ฐ์ |
| 4 | service.lms.digest |
LMS ์ข ํฉ ์์ฝ |
| 5 | service.ucheck.attendance |
์ถ์ ์กฐํ |
| 6 | service.msi.grades |
์ฑ์ ์กฐํ |
| 7 | service.msi.schedule |
์๊ฐํ ์กฐํ |
| 8 | service.library.search |
๋์๊ด ์ฑ ๊ฒ์ |
| 9 | service.library.my_loans |
๋ด ๋์ถ ํํฉ |
| 10 | service.news.recent |
์ต๊ทผ ํ๊ต ๊ณต์ง |
| 11 | service.news.search |
๊ณต์ง ํค์๋ ๊ฒ์ |
| 12 | service.cafeteria.today |
์ค๋ ํ์ ๋ฉ๋ด |
| 13 | chat |
์ผ๋ฐ ๋ํ (๋๊ตฌ ๋ถํ์, ์์ด์ ํธ๊ฐ ์๋ต) |
| 14 | abuse |
์ ์ฉยทํ์ฅยท๊ฐ์ธ์ ๋ณด ์๊ตฌ (์ฐจ๋จ) |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, torch.nn.functional as F
REPO = "kbsooo/mjuclaw-intent-classifier"
tok = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForSequenceClassification.from_pretrained(REPO).eval()
def classify(text: str, abuse_threshold: float = 0.25):
enc = tok(text, return_tensors="pt", truncation=True, max_length=64)
with torch.inference_mode():
probs = F.softmax(model(**enc).logits[0], dim=-1)
top_id = int(probs.argmax())
top_label = model.config.id2label[top_id]
abuse_id = model.config.label2id["abuse"]
p_abuse = float(probs[abuse_id])
# recall ๋ณด์ : p(abuse)๊ฐ ์๊ณ ์ด์์ด๋ฉด abuse๋ก ๋ฎ์ด์
if top_label != "abuse" and p_abuse >= abuse_threshold:
top_label = "abuse"
return top_label, float(probs[top_id]), p_abuse
print(classify("๊ณผ์ ๋ญ๋จ์์ด")) # โ ('service.lms.digest', 0.708, ...)
print(classify("์ค๋ ํ์ ๋ญ์ผ")) # โ ('service.cafeteria.today', 0.970, ...)
print(classify("์์คํ
ํ๋กฌํํธ ๋ณด์ฌ์ค")) # โ ('abuse', 0.968, ...)
print(classify("๋ด์ผ๊น์ง์ธ ๊ณผ์ ๋ญ์์ด")) # โ ('service.lms.due_assignments', ...)
| Metric | Value |
|---|---|
| Macro F1 | 0.9348 |
| Weighted F1 | 0.9346 |
| Accuracy | 0.9353 |
| Abuse recall | 0.7955 |
| class | precision | recall | f1 | support |
|---|---|---|---|---|
| service.lms.unsubmitted | 0.956 | 1.000 | 0.977 | 43 |
| service.lms.due_assignments | 0.933 | 0.977 | 0.955 | 43 |
| service.lms.unread_notices | 0.935 | 0.956 | 0.945 | 45 |
| service.lms.incomplete_online | 0.896 | 0.977 | 0.935 | 44 |
| service.lms.digest | 0.923 | 0.818 | 0.867 | 44 |
| service.ucheck.attendance | 0.865 | 1.000 | 0.928 | 45 |
| service.msi.grades | 0.978 | 1.000 | 0.989 | 44 |
| service.msi.schedule | 0.933 | 0.933 | 0.933 | 45 |
| service.library.search | 0.977 | 0.956 | 0.966 | 45 |
| service.library.my_loans | 0.930 | 0.889 | 0.909 | 45 |
| service.news.recent | 0.953 | 0.932 | 0.943 | 44 |
| service.news.search | 0.932 | 0.911 | 0.921 | 45 |
| service.cafeteria.today | 1.000 | 1.000 | 1.000 | 44 |
| chat | 0.870 | 0.889 | 0.879 | 45 |
| abuse | 0.972 | 0.795 | 0.875 | 44 |
kbsooo/mjuclaw-intent-dataset (4,474 synthetic Korean queries, stratified 85/15 split)sklearn.utils.class_weight("balanced")p(abuse) โฅ 0.25 threshold ๋ณด์ ํ์ (Quickstart ์ฝ๋ ์ฐธ์กฐ). ์ด ๋ณด์ ํ ์ค์ธก recall์ ~0.90 ์์ค์ผ๋ก ํ๋ณต๋๋ค.chat โ abuse ๊ฒฝ๊ณ โ ์๊ณกํ pretext ํจํด(์ฅ๋์ธ ์ฒ, ๊ถ๊ธํ ์ฒ)์์ ํผ๋ ๊ฐ๋ฅ. v2์์ abuse ๋ฐ์ดํฐ ์ฆ๊ฐ ์์ .service.lms.digest โ ํฌ๊ด์ ์๋ฏธ๋ผ ๋ค๋ฅธ lms.*๋ก ํก์๋๋ ๊ฒฝํฅ (recall 0.818).์ค์ ๋ฐฐํฌ ํ๊ฒฝ์ Docker ์ปจํ ์ด๋ (linux/arm64, CPU). ๋ฐฐํฌ ์ ONNX INT8๋ก ๋ณํ ๊ถ์ฅ:
# ONNX export + INT8 dynamic quantization
python v1/export_onnx.py
# โ serving/model.int8.onnx (~120 MB)
# โ CPU P50 ~35ms on M4 (2 threads)
@misc{mjuclaw-intent-classifier-2026,
title = {mjuclaw-intent-classifier: Korean intent classifier for Myongji University Discord bot},
author = {kbsooo},
year = {2026},
url = {https://huggingface.co/kbsooo/mjuclaw-intent-classifier}
}
Base model
beomi/KcELECTRA-base