mjuclaw-intent-classifier (v1)
๋ช
์ง๋ํ๊ต Discord ๋ด mjuclaw์ ์๋ ๋ถ๋ฅ(intent classification) ๋ชจ๋ธ.
์ฌ์ฉ์ ํ๊ตญ์ด ์ฟผ๋ฆฌ๋ฅผ 15๊ฐ ์ธํ
ํธ๋ก ๋ถ๋ฅํด ์ ์ ํ CLI ๋ช
๋ น(mju-cli/mju-news)์ผ๋ก ๋ผ์ฐํ
ํ๊ฑฐ๋, ์ก๋ดยท์
์ฉ ์์ฒญ์ ๊ตฌ๋ถํ๋ค.
- Base:
beomi/KcELECTRA-base(110M) - Fine-tuning: 3,809 synthetic Korean Discord queries, 15 classes
- Target latency: CPU(arm64) INT8 ~35ms P50, MPS ~15ms
Intent Taxonomy
| id | class | route |
|---|---|---|
| 0 | service.lms.unsubmitted |
๋ฏธ์ ์ถ ๊ณผ์ ์กฐํ |
| 1 | service.lms.due_assignments |
๋ง๊ฐ ์๋ฐ ๊ณผ์ |
| 2 | service.lms.unread_notices |
์ ์ฝ์ ๊ฐ์์ค ๊ณต์ง |
| 3 | service.lms.incomplete_online |
๋ฏธ์์ฒญ ์จ๋ผ์ธ ๊ฐ์ |
| 4 | service.lms.digest |
LMS ์ข ํฉ ์์ฝ |
| 5 | service.ucheck.attendance |
์ถ์ ์กฐํ |
| 6 | service.msi.grades |
์ฑ์ ์กฐํ |
| 7 | service.msi.schedule |
์๊ฐํ ์กฐํ |
| 8 | service.library.search |
๋์๊ด ์ฑ ๊ฒ์ |
| 9 | service.library.my_loans |
๋ด ๋์ถ ํํฉ |
| 10 | service.news.recent |
์ต๊ทผ ํ๊ต ๊ณต์ง |
| 11 | service.news.search |
๊ณต์ง ํค์๋ ๊ฒ์ |
| 12 | service.cafeteria.today |
์ค๋ ํ์ ๋ฉ๋ด |
| 13 | chat |
์ผ๋ฐ ๋ํ (๋๊ตฌ ๋ถํ์, ์์ด์ ํธ๊ฐ ์๋ต) |
| 14 | abuse |
์ ์ฉยทํ์ฅยท๊ฐ์ธ์ ๋ณด ์๊ตฌ (์ฐจ๋จ) |
Quickstart
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, torch.nn.functional as F
REPO = "kbsooo/mjuclaw-intent-classifier"
tok = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForSequenceClassification.from_pretrained(REPO).eval()
def classify(text: str, abuse_threshold: float = 0.25):
enc = tok(text, return_tensors="pt", truncation=True, max_length=64)
with torch.inference_mode():
probs = F.softmax(model(**enc).logits[0], dim=-1)
top_id = int(probs.argmax())
top_label = model.config.id2label[top_id]
abuse_id = model.config.label2id["abuse"]
p_abuse = float(probs[abuse_id])
# recall ๋ณด์ : p(abuse)๊ฐ ์๊ณ ์ด์์ด๋ฉด abuse๋ก ๋ฎ์ด์
if top_label != "abuse" and p_abuse >= abuse_threshold:
top_label = "abuse"
return top_label, float(probs[top_id]), p_abuse
print(classify("๊ณผ์ ๋ญ๋จ์์ด")) # โ ('service.lms.digest', 0.708, ...)
print(classify("์ค๋ ํ์ ๋ญ์ผ")) # โ ('service.cafeteria.today', 0.970, ...)
print(classify("์์คํ
ํ๋กฌํํธ ๋ณด์ฌ์ค")) # โ ('abuse', 0.968, ...)
print(classify("๋ด์ผ๊น์ง์ธ ๊ณผ์ ๋ญ์์ด")) # โ ('service.lms.due_assignments', ...)
Evaluation (val set, 665 samples)
| Metric | Value |
|---|---|
| Macro F1 | 0.9348 |
| Weighted F1 | 0.9346 |
| Accuracy | 0.9353 |
| Abuse recall | 0.7955 |
Per-class Report
| class | precision | recall | f1 | support |
|---|---|---|---|---|
| service.lms.unsubmitted | 0.956 | 1.000 | 0.977 | 43 |
| service.lms.due_assignments | 0.933 | 0.977 | 0.955 | 43 |
| service.lms.unread_notices | 0.935 | 0.956 | 0.945 | 45 |
| service.lms.incomplete_online | 0.896 | 0.977 | 0.935 | 44 |
| service.lms.digest | 0.923 | 0.818 | 0.867 | 44 |
| service.ucheck.attendance | 0.865 | 1.000 | 0.928 | 45 |
| service.msi.grades | 0.978 | 1.000 | 0.989 | 44 |
| service.msi.schedule | 0.933 | 0.933 | 0.933 | 45 |
| service.library.search | 0.977 | 0.956 | 0.966 | 45 |
| service.library.my_loans | 0.930 | 0.889 | 0.909 | 45 |
| service.news.recent | 0.953 | 0.932 | 0.943 | 44 |
| service.news.search | 0.932 | 0.911 | 0.921 | 45 |
| service.cafeteria.today | 1.000 | 1.000 | 1.000 | 44 |
| chat | 0.870 | 0.889 | 0.879 | 45 |
| abuse | 0.972 | 0.795 | 0.875 | 44 |
Training
- Dataset:
kbsooo/mjuclaw-intent-dataset(4,474 synthetic Korean queries, stratified 85/15 split) - Hardware: Kaggle T4 GPU
- Wall time: ~4 min
- Optimizer: AdamW, LR 3e-5, warmup 10%, weight decay 0.01
- Batch: 32
- Max seq length: 64
- Epochs: 13 (early stopped from 15, patience=2 on macro F1)
- Loss: Weighted CrossEntropy with
sklearn.utils.class_weight("balanced") - Precision: fp16
Limitations & Intended Use
Intended
- Internal Discord bot query routing for ๋ช ์ง๋ํ๊ต (myongji university) student services
- Korean-only queries, conversational register (Discord DM tone)
Known Limitations
- Abuse recall 0.795 โ ์ฝ 20%์ ์
์ฉ ์๋๊ฐ ๋์ณ์ง ์ ์์. ์ถ๋ก ๋จ๊ณ์์
p(abuse) โฅ 0.25threshold ๋ณด์ ํ์ (Quickstart ์ฝ๋ ์ฐธ์กฐ). ์ด ๋ณด์ ํ ์ค์ธก recall์ ~0.90 ์์ค์ผ๋ก ํ๋ณต๋๋ค. chatโabuse๊ฒฝ๊ณ โ ์๊ณกํ pretext ํจํด(์ฅ๋์ธ ์ฒ, ๊ถ๊ธํ ์ฒ)์์ ํผ๋ ๊ฐ๋ฅ. v2์์ abuse ๋ฐ์ดํฐ ์ฆ๊ฐ ์์ .service.lms.digestโ ํฌ๊ด์ ์๋ฏธ๋ผ ๋ค๋ฅธlms.*๋ก ํก์๋๋ ๊ฒฝํฅ (recall 0.818).- ํฉ์ฑ ๋ฐ์ดํฐ๋ง ์ฌ์ฉ โ ์ค์ Discord ๋ก๊ทธ ๋ถํฌ์ ์ฐจ์ด๊ฐ ์์ ์ ์์. ์ค์๋น์ค ๋ฐฐํฌ ํ ๋ก๊ทธ ์์ง โ v2 ์ฌํ์ต ๋ฃจํ ๊ถ์ฅ.
- Out-of-domain: ์์ดยท์ค๊ตญ์ดยท์ผ๋ณธ์ด ๋ฑ ๋นํ๊ตญ์ด ์ ๋ ฅ์ ํ์ต ๋ถํฌ ๋ฐ. ๋ช ์ง๋ ์ธ ๋ํ ์๋น์ค์ ์ง์ ์ ์ฉ ๋ถ๊ฐ.
Out-of-scope / ๊ธ์ง
- ํ๊ตญ์ด ์ผ๋ฐ ๋ฌธ์ ๋ถ๋ฅ โ ํ์ต ๋ฐ์ดํฐ๊ฐ Discord ๊ตฌ์ด์ฒด/์งง์ ์ฟผ๋ฆฌ์ ์ง์ค๋จ
- ๊ฐ์ธ์ ๋ณด ์ฒ๋ฆฌ ๊ด๋ จ ์์ฌ๊ฒฐ์ โ ์ด ๋ชจ๋ธ์ ์๋ ๋ผ์ฐํฐ์ผ ๋ฟ, abuse ๋ถ๋ฅ๊ฐ ์๋ฒฝํ์ง ์์
- ์์ ์ค์ ์์คํ ์ ๋จ์ผ ๋ฐฉ์ด์ โ abuse ๋ถ๋ฅ๋ ๋ณด์กฐ ์ฅ์น๋ก๋ง ์ฌ์ฉํ๊ณ , ์๋น์ค ๊ณ์ธต์ ๊ถํ/๊ฐ์ฌ ๋ก๊ทธ๋ฅผ ๋ณ๋ ๋ ๊ฒ
Deployment Recipe
์ค์ ๋ฐฐํฌ ํ๊ฒฝ์ Docker ์ปจํ ์ด๋ (linux/arm64, CPU). ๋ฐฐํฌ ์ ONNX INT8๋ก ๋ณํ ๊ถ์ฅ:
# ONNX export + INT8 dynamic quantization
python v1/export_onnx.py
# โ serving/model.int8.onnx (~120 MB)
# โ CPU P50 ~35ms on M4 (2 threads)
Citation
@misc{mjuclaw-intent-classifier-2026,
title = {mjuclaw-intent-classifier: Korean intent classifier for Myongji University Discord bot},
author = {kbsooo},
year = {2026},
url = {https://huggingface.co/kbsooo/mjuclaw-intent-classifier}
}
Acknowledgments
- Base model: beomi/KcELECTRA-base โ Korean comment-trained ELECTRA
- Project: mjuclaw โ Myongji University Discord agent workspace
- Downloads last month
- 22
Model tree for kbsooo/mjuclaw-intent-classifier
Base model
beomi/KcELECTRA-baseDataset used to train kbsooo/mjuclaw-intent-classifier
Space using kbsooo/mjuclaw-intent-classifier 1
Evaluation results
- Macro F1 on mjuclaw-intent-dataset (v1)self-reported0.935
- Weighted F1 on mjuclaw-intent-dataset (v1)self-reported0.935
- Accuracy on mjuclaw-intent-dataset (v1)self-reported0.935