Issue Priority Predictor (Korean)
컀λ°/μ΄μμ μ°μ μμλ₯Ό μλμΌλ‘ μμΈ‘νλ νκ΅μ΄/μμ΄ μ§μ λͺ¨λΈ
Model Details
μ΄ λͺ¨λΈμ GitHub μ»€λ° ν μ€νΈλ₯Ό κΈ°λ°μΌλ‘ μ°μ μμ μ μ(priority score)λ₯Ό μμΈ‘νλ λ€κ΅μ΄ λͺ¨λΈμ λλ€.
distilbert-base-multilingual-casedλ₯Ό κΈ°λ°μΌλ‘ νμ¬, νκ΅μ΄μ μμ΄λ‘ μμ±λ μ»€λ° λ°μ΄ν°λ₯Ό μ¬μ©ν΄ νμΈνλλμμ΅λλ€.
λͺ¨λΈμ μ λ ₯ ν μ€νΈμ λν΄ 0~1 λ²μμ μ°μμ μΈ μ μλ₯Ό μΆλ ₯νλ©°, μ μκ° λμμλ‘ μλμ μΌλ‘ μ°μ μμκ° λμμ μλ―Έν©λλ€. μ΅μ’ μ μΈ μ°μ μμ ν΄λμ€(HIGH / MED / LOW)λ μλΉμ€ νκ²½μ λ§λ νμ²λ¦¬ μ μ± μ ν΅ν΄ κ²°μ νλ κ²μ μ μ λ‘ ν©λλ€.
Evaluation Metrics
μλ νκ°μ§νλ 0~1λ‘ μ€μΌμΌλ§λ μ°μ μμ μ μλ₯Ό κΈ°μ€μΌλ‘ μ°μΆλμμ΅λλ€.
Loss: 0.0045
MAE (νκ· μ λ μ€μ°¨): 0.0122
RMSE (νκ· μ κ³±κ·Ό μ€μ°¨): 0.0150
Spearman μκ΄κ³μ: 0.8473
Note λ³Έ λͺ¨λΈμ μ°μ μμλ₯Ό μ§μ λΆλ₯(classification)νμ§ μκ³ , λͺ¨λΈμ΄ μμΈ‘ν μ μλ₯Ό κΈ°λ°μΌλ‘ λλ©μΈ μ μ± (보μ, κ²°μ , μ₯μ , λ¬Έμ λ³κ²½ λ±)μ λ°μν νμ²λ¦¬λ₯Ό μ μ©νλλ‘ μ€κ³λμμ΅λλ€.
π λΉ λ₯Έ μμ
λͺ¨λΈ μμΈ‘ (μ μλ§ μΆλ ₯)
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import json
# λͺ¨λΈ λ‘λ
model_name = "your-username/issue-priority-ko"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
# μμΈ‘ (μ μλ§ μΆλ ₯)
text = "λ‘κ·ΈμΈ μλ¨, ν ν° λ§λ£ μ²λ¦¬ νμ"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
score_raw = model(**inputs).logits.item() # 0~1 λ²μ μ μ
# μλ μ€μΌμΌλ‘ 볡μ
with open("score_thresholds.json", "r", encoding="utf-8") as f:
thresholds = json.load(f)
score = score_raw * (thresholds["train_max"] - thresholds["train_min"]) + thresholds["train_min"]
print(f"Predicted Score: {score:.4f}")
μ μ β ν΄λμ€ λ³ν (νμ²λ¦¬)
# λ°©λ² 1: to_priority ν¨μ μ¬μ© (κΆμ₯)
from postprocess.to_priority import to_priority
# κΈ°λ³Έ λ³ν (νμ²λ¦¬ κ·μΉ μμ)
priority = to_priority(score=score, text=text)
print(f"Priority: {priority}")
# νμ²λ¦¬ κ·μΉ ν¬ν¨ (μ΅μ
)
priority = to_priority(score=score, text=text, use_rules=True)
print(f"Priority (with rules): {priority}")
# λ°©λ² 2: μ§μ λ³ν
if score >= thresholds["q_high"]:
priority = "HIGH"
elif score <= thresholds["q_low"]:
priority = "LOW"
else:
priority = "MED"
π λͺ¨λΈ μ 보
| νλͺ© | λ΄μ© |
|---|---|
| κΈ°λ° λͺ¨λΈ | distilbert-base-multilingual-cased |
| μμ μ ν | νκ· (Regression) |
| μ λ ₯ | 컀λ°/μ΄μ μ λͺ© + λ³Έλ¬Έ ν μ€νΈ |
| μΆλ ₯ | μ°μ μμ μ μ (float) |
| ν΄λμ€ λ³ν | νμ²λ¦¬λ‘ μν (to_priority() ν¨μ) |
| μΈμ΄ | νκ΅μ΄, μμ΄ |
| μ΅λ κΈΈμ΄ | 256 ν ν° |
μ€μ: λͺ¨λΈμ μ μλ§ μΆλ ₯ν©λλ€. HIGH/MED/LOW ν΄λμ€ λ³νμ
to_priority()ν¨μλ₯Ό μ¬μ©νμΈμ.
π― μ£Όμ νΉμ§
- λ€κ΅μ΄ μ§μ: νκ΅μ΄μ μμ΄ μ»€λ°/μ΄μ λͺ¨λ μ²λ¦¬ κ°λ₯
- ν€μλ κΈ°λ° νμ²λ¦¬:
postprocess/priority_rules.yamlλ‘ κ·μΉ 컀μ€ν°λ§μ΄μ§ - λ°°μΉ λ΄ μλ μ λ ¬: μ¬λ¬ μ΄μλ₯Ό ν¨κ» λΉκ΅νμ¬ λ μ νν μ°μ μμ μμΈ‘
- κ²½λ λͺ¨λΈ: DistilBERT κΈ°λ°μΌλ‘ λΉ λ₯Έ μΆλ‘ μλ
π ν΄λ ꡬ쑰
issue-priority-ko/
βββ README.md # μ΄ νμΌ
βββ config.json # λͺ¨λΈ μ€μ
βββ model.safetensors # λͺ¨λΈ κ°μ€μΉ
βββ tokenizer.json # ν ν¬λμ΄μ
βββ tokenizer_config.json
βββ vocab.txt
βββ score_thresholds.json # μ°μ μμ λ³ν μκ³κ°
β
βββ postprocess/ # νμ²λ¦¬ κ·μΉ (μ΅μ
)
β βββ to_priority.py # μ μβν΄λμ€ λ³ν ν¨μ
β βββ priority_rules.yaml # ν€μλ κΈ°λ° κ·μΉ (μ΅μ
)
β βββ README.md # νμ²λ¦¬ μ€λͺ
β
βββ examples/ # μ¬μ© μμ
β βββ input.json
β βββ output.json
β
βββ requirements.txt # μμ‘΄μ± ν¨ν€μ§
π μ μ β ν΄λμ€ λ³ν
to_priority() ν¨μ μ¬μ©
from postprocess.to_priority import to_priority
# κΈ°λ³Έ λ³ν (threshold κΈ°λ°)
priority = to_priority(score=0.82, text="λ‘κ·ΈμΈ μλ¬ λ°μ")
# νμ²λ¦¬ κ·μΉ ν¬ν¨ (μ΅μ
)
priority = to_priority(score=0.82, text="λ‘κ·ΈμΈ μλ¬ λ°μ", use_rules=True)
# λ°°μΉ λ³ν
from postprocess.to_priority import to_priority_batch
scores = [0.82, 0.75, 0.90]
texts = ["λ‘κ·ΈμΈ μλ¬", "README μμ ", "μλ² λ€μ΄"]
priorities = to_priority_batch(scores, texts, use_rules=True)
νμ²λ¦¬ κ·μΉ (μ΅μ )
postprocess/priority_rules.yamlμ μ¬μ©νμ¬ ν€μλ κΈ°λ° κ·μΉμ μ μ©ν μ μμ΅λλ€.
κ·μΉ μμ:
- LOW κ°μ :
readme,typo,λ¬Έμβ 무쑰건 LOW - μ΅μ MED 보μ₯:
μ₯μ,μλ¬,λ‘κ·ΈμΈ,κ²°μ β μ΅μ MED - HIGH λΆμ€νΈ:
λ°μ΄ν° μμ€,무ν,criticalβ HIGH
μμΈν λ΄μ©μ postprocess/README.mdλ₯Ό μ°Έκ³ νμΈμ.
π μ±λ₯ μ§ν
| μ§ν | κ° |
|---|---|
| MAE | 0.009 (μ€μΌμΌλ κ° κΈ°μ€) |
| RMSE | 0.015 (μ€μΌμΌλ κ° κΈ°μ€) |
| Spearman Correlation | 0.85 |
μ°Έκ³ : λͺ¨λΈμ μλμ μμ μμΈ‘μ λ μ ν©ν©λλ€. μ λ μ μ보λ€λ λ°°μΉ λ΄ λΉκ΅λ₯Ό κΆμ₯ν©λλ€.
π‘ μ¬μ© ν
1. λ¨μΌ μμΈ‘
# λͺ¨λΈ μμΈ‘
text = "λ‘κ·ΈμΈ μλ¨"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
score_raw = model(**inputs).logits.item()
# μ€μΌμΌ 볡μ
score = score_raw * (thresholds["train_max"] - thresholds["train_min"]) + thresholds["train_min"]
# ν΄λμ€ λ³ν
from postprocess.to_priority import to_priority
priority = to_priority(score=score, text=text, use_rules=True)
2. λ°°μΉ μμΈ‘ (κΆμ₯)
texts = ["μ΄μ1", "μ΄μ2", "μ΄μ3"]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, max_length=256, padding=True)
with torch.no_grad():
scores_raw = model(**inputs).logits.squeeze(-1).numpy()
# μ€μΌμΌ 볡μ
scores = scores_raw * (train_max - train_min) + train_min
# λ°°μΉ λ΄ μλ μ λ ¬ (quantile κΈ°λ°)
from scipy.stats import rankdata
normalized = rankdata(scores, method='average') / len(scores)
# μμ 30% = HIGH, νμ 30% = LOW
q_high = np.percentile(normalized, 70)
q_low = np.percentile(normalized, 30)
3. λ°°μΉ μμΈ‘ + ν΄λμ€ λ³ν
# λ°°μΉ μμΈ‘
texts = ["μ΄μ1", "μ΄μ2", "μ΄μ3"]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, max_length=256, padding=True)
with torch.no_grad():
scores_raw = model(**inputs).logits.squeeze(-1).numpy()
# μ€μΌμΌ 볡μ
scores = scores_raw * (thresholds["train_max"] - thresholds["train_min"]) + thresholds["train_min"]
# λ°°μΉ ν΄λμ€ λ³ν
from postprocess.to_priority import to_priority_batch
priorities = to_priority_batch(scores, texts, use_rules=True)
for text, score, priority in zip(texts, scores, priorities):
print(f"{priority}: {score:.4f} - {text}")
β οΈ μ£Όμμ¬ν
- λͺ¨λΈ μΆλ ₯: λͺ¨λΈμ μ μλ§ μΆλ ₯ν©λλ€ (νκ· λͺ¨λΈ). ν΄λμ€ λ³νμ
to_priority()ν¨μ μ¬μ© - μ€μΌμΌ 볡μ νμ: λͺ¨λΈ μΆλ ₯μ 0~1 λ²μμ
λλ€.
score_thresholds.jsonμΌλ‘ μλ μ€μΌμΌ 볡μ νμ - μλμ μμ: μ λ μ μ보λ€λ λ°°μΉ λ΄ μλ λΉκ΅κ° λ μ ν
- νμ²λ¦¬ κ·μΉ:
priority_rules.yamlμ μ΅μ μ λλ€. νμμμλ§ μ¬μ© - λλ©μΈ μ μ: μλ‘μ΄ λλ©μΈμμλ μ¬νμ΅ λλ νμΈνλ κΆμ₯
π μμ
μ€μ μ¬μ© μμ λ examples/ ν΄λλ₯Ό μ°Έκ³ νμΈμ.
input.json: μ λ ₯ μμ output.json: μΆλ ₯ μμ
π κ΄λ ¨ μλ£
- λ³ν ν¨μ:
postprocess/to_priority.py- μ μβν΄λμ€ λ³ν - νμ²λ¦¬ κ·μΉ (μ΅μ
):
postprocess/priority_rules.yaml - νμ²λ¦¬ μ€λͺ
:
postprocess/README.md
π λΌμ΄μΌμ€
- Apache 2.0
- Downloads last month
- 100