|
|
---
|
|
|
language:
|
|
|
- ko
|
|
|
- en
|
|
|
tags:
|
|
|
- text-classification
|
|
|
- regression
|
|
|
- commit-priority
|
|
|
- issue-priority
|
|
|
license: apache-2.0
|
|
|
datasets:
|
|
|
- custom
|
|
|
metrics:
|
|
|
- mae
|
|
|
- rmse
|
|
|
- spearman
|
|
|
---
|
|
|
|
|
|
# Issue Priority Predictor (Korean)
|
|
|
|
|
|
**컀λ°/μ΄μμ μ°μ μμλ₯Ό μλμΌλ‘ μμΈ‘νλ νκ΅μ΄/μμ΄ μ§μ λͺ¨λΈ**
|
|
|
|
|
|
## Model Details
|
|
|
|
|
|
μ΄ λͺ¨λΈμ GitHub μ»€λ° ν
μ€νΈλ₯Ό κΈ°λ°μΌλ‘
|
|
|
μ°μ μμ μ μ(priority score)λ₯Ό μμΈ‘νλ λ€κ΅μ΄ λͺ¨λΈμ
λλ€.
|
|
|
|
|
|
distilbert-base-multilingual-casedλ₯Ό κΈ°λ°μΌλ‘ νμ¬,
|
|
|
νκ΅μ΄μ μμ΄λ‘ μμ±λ μ»€λ° λ°μ΄ν°λ₯Ό μ¬μ©ν΄ νμΈνλλμμ΅λλ€.
|
|
|
|
|
|
λͺ¨λΈμ μ
λ ₯ ν
μ€νΈμ λν΄ 0~1 λ²μμ μ°μμ μΈ μ μλ₯Ό μΆλ ₯νλ©°,
|
|
|
μ μκ° λμμλ‘ μλμ μΌλ‘ μ°μ μμκ° λμμ μλ―Έν©λλ€.
|
|
|
μ΅μ’
μ μΈ μ°μ μμ ν΄λμ€(HIGH / MED / LOW)λ
|
|
|
μλΉμ€ νκ²½μ λ§λ νμ²λ¦¬ μ μ±
μ ν΅ν΄ κ²°μ νλ κ²μ μ μ λ‘ ν©λλ€.
|
|
|
|
|
|
Evaluation Metrics
|
|
|
|
|
|
μλ νκ°μ§νλ 0~1λ‘ μ€μΌμΌλ§λ μ°μ μμ μ μλ₯Ό κΈ°μ€μΌλ‘ μ°μΆλμμ΅λλ€.
|
|
|
|
|
|
Loss: 0.0045
|
|
|
|
|
|
MAE (νκ· μ λ μ€μ°¨): 0.0122
|
|
|
|
|
|
RMSE (νκ· μ κ³±κ·Ό μ€μ°¨): 0.0150
|
|
|
|
|
|
Spearman μκ΄κ³μ: 0.8473
|
|
|
|
|
|
**Note**
|
|
|
λ³Έ λͺ¨λΈμ μ°μ μμλ₯Ό μ§μ λΆλ₯(classification)νμ§ μκ³ , λͺ¨λΈμ΄ μμΈ‘ν μ μλ₯Ό κΈ°λ°μΌλ‘
|
|
|
λλ©μΈ μ μ±
(보μ, κ²°μ , μ₯μ , λ¬Έμ λ³κ²½ λ±)μ λ°μν νμ²λ¦¬λ₯Ό μ μ©νλλ‘ μ€κ³λμμ΅λλ€.
|
|
|
|
|
|
## π λΉ λ₯Έ μμ
|
|
|
|
|
|
### λͺ¨λΈ μμΈ‘ (μ μλ§ μΆλ ₯)
|
|
|
|
|
|
```python
|
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
|
|
import torch
|
|
|
import json
|
|
|
|
|
|
# λͺ¨λΈ λ‘λ
|
|
|
model_name = "your-username/issue-priority-ko"
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
|
|
model.eval()
|
|
|
|
|
|
# μμΈ‘ (μ μλ§ μΆλ ₯)
|
|
|
text = "λ‘κ·ΈμΈ μλ¨, ν ν° λ§λ£ μ²λ¦¬ νμ"
|
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
|
|
|
|
|
|
with torch.no_grad():
|
|
|
score_raw = model(**inputs).logits.item() # 0~1 λ²μ μ μ
|
|
|
|
|
|
# μλ μ€μΌμΌλ‘ 볡μ
|
|
|
with open("score_thresholds.json", "r", encoding="utf-8") as f:
|
|
|
thresholds = json.load(f)
|
|
|
|
|
|
score = score_raw * (thresholds["train_max"] - thresholds["train_min"]) + thresholds["train_min"]
|
|
|
|
|
|
print(f"Predicted Score: {score:.4f}")
|
|
|
```
|
|
|
|
|
|
### μ μ β ν΄λμ€ λ³ν (νμ²λ¦¬)
|
|
|
|
|
|
```python
|
|
|
# λ°©λ² 1: to_priority ν¨μ μ¬μ© (κΆμ₯)
|
|
|
from postprocess.to_priority import to_priority
|
|
|
|
|
|
# κΈ°λ³Έ λ³ν (νμ²λ¦¬ κ·μΉ μμ)
|
|
|
priority = to_priority(score=score, text=text)
|
|
|
print(f"Priority: {priority}")
|
|
|
|
|
|
# νμ²λ¦¬ κ·μΉ ν¬ν¨ (μ΅μ
)
|
|
|
priority = to_priority(score=score, text=text, use_rules=True)
|
|
|
print(f"Priority (with rules): {priority}")
|
|
|
```
|
|
|
|
|
|
```python
|
|
|
# λ°©λ² 2: μ§μ λ³ν
|
|
|
if score >= thresholds["q_high"]:
|
|
|
priority = "HIGH"
|
|
|
elif score <= thresholds["q_low"]:
|
|
|
priority = "LOW"
|
|
|
else:
|
|
|
priority = "MED"
|
|
|
```
|
|
|
|
|
|
## π λͺ¨λΈ μ 보
|
|
|
|
|
|
| νλͺ© | λ΄μ© |
|
|
|
|------|------|
|
|
|
| **κΈ°λ° λͺ¨λΈ** | `distilbert-base-multilingual-cased` |
|
|
|
| **μμ
μ ν** | νκ· (Regression) |
|
|
|
| **μ
λ ₯** | 컀λ°/μ΄μ μ λͺ© + λ³Έλ¬Έ ν
μ€νΈ |
|
|
|
| **μΆλ ₯** | μ°μ μμ μ μ (float) |
|
|
|
| **ν΄λμ€ λ³ν** | νμ²λ¦¬λ‘ μν (`to_priority()` ν¨μ) |
|
|
|
| **μΈμ΄** | νκ΅μ΄, μμ΄ |
|
|
|
| **μ΅λ κΈΈμ΄** | 256 ν ν° |
|
|
|
|
|
|
> **μ€μ**: λͺ¨λΈμ μ μλ§ μΆλ ₯ν©λλ€. HIGH/MED/LOW ν΄λμ€ λ³νμ `to_priority()` ν¨μλ₯Ό μ¬μ©νμΈμ.
|
|
|
|
|
|
## π― μ£Όμ νΉμ§
|
|
|
|
|
|
1. **λ€κ΅μ΄ μ§μ**: νκ΅μ΄μ μμ΄ μ»€λ°/μ΄μ λͺ¨λ μ²λ¦¬ κ°λ₯
|
|
|
2. **ν€μλ κΈ°λ° νμ²λ¦¬**: `postprocess/priority_rules.yaml`λ‘ κ·μΉ 컀μ€ν°λ§μ΄μ§
|
|
|
3. **λ°°μΉ λ΄ μλ μ λ ¬**: μ¬λ¬ μ΄μλ₯Ό ν¨κ» λΉκ΅νμ¬ λ μ νν μ°μ μμ μμΈ‘
|
|
|
4. **κ²½λ λͺ¨λΈ**: DistilBERT κΈ°λ°μΌλ‘ λΉ λ₯Έ μΆλ‘ μλ
|
|
|
|
|
|
## π ν΄λ ꡬ쑰
|
|
|
|
|
|
```
|
|
|
issue-priority-ko/
|
|
|
βββ README.md # μ΄ νμΌ
|
|
|
βββ config.json # λͺ¨λΈ μ€μ
|
|
|
βββ model.safetensors # λͺ¨λΈ κ°μ€μΉ
|
|
|
βββ tokenizer.json # ν ν¬λμ΄μ
|
|
|
βββ tokenizer_config.json
|
|
|
βββ vocab.txt
|
|
|
βββ score_thresholds.json # μ°μ μμ λ³ν μκ³κ°
|
|
|
β
|
|
|
βββ postprocess/ # νμ²λ¦¬ κ·μΉ (μ΅μ
)
|
|
|
β βββ to_priority.py # μ μβν΄λμ€ λ³ν ν¨μ
|
|
|
β βββ priority_rules.yaml # ν€μλ κΈ°λ° κ·μΉ (μ΅μ
)
|
|
|
β βββ README.md # νμ²λ¦¬ μ€λͺ
|
|
|
β
|
|
|
βββ examples/ # μ¬μ© μμ
|
|
|
β βββ input.json
|
|
|
β βββ output.json
|
|
|
β
|
|
|
βββ requirements.txt # μμ‘΄μ± ν¨ν€μ§
|
|
|
```
|
|
|
|
|
|
## π μ μ β ν΄λμ€ λ³ν
|
|
|
|
|
|
### `to_priority()` ν¨μ μ¬μ©
|
|
|
|
|
|
```python
|
|
|
from postprocess.to_priority import to_priority
|
|
|
|
|
|
# κΈ°λ³Έ λ³ν (threshold κΈ°λ°)
|
|
|
priority = to_priority(score=0.82, text="λ‘κ·ΈμΈ μλ¬ λ°μ")
|
|
|
|
|
|
# νμ²λ¦¬ κ·μΉ ν¬ν¨ (μ΅μ
)
|
|
|
priority = to_priority(score=0.82, text="λ‘κ·ΈμΈ μλ¬ λ°μ", use_rules=True)
|
|
|
|
|
|
# λ°°μΉ λ³ν
|
|
|
from postprocess.to_priority import to_priority_batch
|
|
|
scores = [0.82, 0.75, 0.90]
|
|
|
texts = ["λ‘κ·ΈμΈ μλ¬", "README μμ ", "μλ² λ€μ΄"]
|
|
|
priorities = to_priority_batch(scores, texts, use_rules=True)
|
|
|
```
|
|
|
|
|
|
### νμ²λ¦¬ κ·μΉ (μ΅μ
)
|
|
|
|
|
|
`postprocess/priority_rules.yaml`μ μ¬μ©νμ¬ ν€μλ κΈ°λ° κ·μΉμ μ μ©ν μ μμ΅λλ€.
|
|
|
|
|
|
**κ·μΉ μμ:**
|
|
|
- **LOW κ°μ **: `readme`, `typo`, `λ¬Έμ` β 무쑰건 LOW
|
|
|
- **μ΅μ MED 보μ₯**: `μ₯μ `, `μλ¬`, `λ‘κ·ΈμΈ`, `κ²°μ ` β μ΅μ MED
|
|
|
- **HIGH λΆμ€νΈ**: `λ°μ΄ν° μμ€`, `무ν`, `critical` β HIGH
|
|
|
|
|
|
μμΈν λ΄μ©μ [`postprocess/README.md`](postprocess/README.md)λ₯Ό μ°Έκ³ νμΈμ.
|
|
|
|
|
|
## π μ±λ₯ μ§ν
|
|
|
|
|
|
| μ§ν | κ° |
|
|
|
|------|-----|
|
|
|
| **MAE** | 0.009 (μ€μΌμΌλ κ° κΈ°μ€) |
|
|
|
| **RMSE** | 0.015 (μ€μΌμΌλ κ° κΈ°μ€) |
|
|
|
| **Spearman Correlation** | 0.85 |
|
|
|
|
|
|
> **μ°Έκ³ **: λͺ¨λΈμ μλμ μμ μμΈ‘μ λ μ ν©ν©λλ€. μ λ μ μ보λ€λ λ°°μΉ λ΄ λΉκ΅λ₯Ό κΆμ₯ν©λλ€.
|
|
|
|
|
|
## π‘ μ¬μ© ν
|
|
|
|
|
|
### 1. λ¨μΌ μμΈ‘
|
|
|
```python
|
|
|
# λͺ¨λΈ μμΈ‘
|
|
|
text = "λ‘κ·ΈμΈ μλ¨"
|
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
|
|
|
with torch.no_grad():
|
|
|
score_raw = model(**inputs).logits.item()
|
|
|
|
|
|
# μ€μΌμΌ 볡μ
|
|
|
score = score_raw * (thresholds["train_max"] - thresholds["train_min"]) + thresholds["train_min"]
|
|
|
|
|
|
# ν΄λμ€ λ³ν
|
|
|
from postprocess.to_priority import to_priority
|
|
|
priority = to_priority(score=score, text=text, use_rules=True)
|
|
|
```
|
|
|
|
|
|
### 2. λ°°μΉ μμΈ‘ (κΆμ₯)
|
|
|
```python
|
|
|
texts = ["μ΄μ1", "μ΄μ2", "μ΄μ3"]
|
|
|
inputs = tokenizer(texts, return_tensors="pt", truncation=True, max_length=256, padding=True)
|
|
|
|
|
|
with torch.no_grad():
|
|
|
scores_raw = model(**inputs).logits.squeeze(-1).numpy()
|
|
|
|
|
|
# μ€μΌμΌ 볡μ
|
|
|
scores = scores_raw * (train_max - train_min) + train_min
|
|
|
|
|
|
# λ°°μΉ λ΄ μλ μ λ ¬ (quantile κΈ°λ°)
|
|
|
from scipy.stats import rankdata
|
|
|
normalized = rankdata(scores, method='average') / len(scores)
|
|
|
|
|
|
# μμ 30% = HIGH, νμ 30% = LOW
|
|
|
q_high = np.percentile(normalized, 70)
|
|
|
q_low = np.percentile(normalized, 30)
|
|
|
```
|
|
|
|
|
|
### 3. λ°°μΉ μμΈ‘ + ν΄λμ€ λ³ν
|
|
|
```python
|
|
|
# λ°°μΉ μμΈ‘
|
|
|
texts = ["μ΄μ1", "μ΄μ2", "μ΄μ3"]
|
|
|
inputs = tokenizer(texts, return_tensors="pt", truncation=True, max_length=256, padding=True)
|
|
|
|
|
|
with torch.no_grad():
|
|
|
scores_raw = model(**inputs).logits.squeeze(-1).numpy()
|
|
|
|
|
|
# μ€μΌμΌ 볡μ
|
|
|
scores = scores_raw * (thresholds["train_max"] - thresholds["train_min"]) + thresholds["train_min"]
|
|
|
|
|
|
# λ°°μΉ ν΄λμ€ λ³ν
|
|
|
from postprocess.to_priority import to_priority_batch
|
|
|
priorities = to_priority_batch(scores, texts, use_rules=True)
|
|
|
|
|
|
for text, score, priority in zip(texts, scores, priorities):
|
|
|
print(f"{priority}: {score:.4f} - {text}")
|
|
|
```
|
|
|
|
|
|
## β οΈ μ£Όμμ¬ν
|
|
|
|
|
|
1. **λͺ¨λΈ μΆλ ₯**: λͺ¨λΈμ μ μλ§ μΆλ ₯ν©λλ€ (νκ· λͺ¨λΈ). ν΄λμ€ λ³νμ `to_priority()` ν¨μ μ¬μ©
|
|
|
2. **μ€μΌμΌ 볡μ νμ**: λͺ¨λΈ μΆλ ₯μ 0~1 λ²μμ
λλ€. `score_thresholds.json`μΌλ‘ μλ μ€μΌμΌ 볡μ νμ
|
|
|
3. **μλμ μμ**: μ λ μ μ보λ€λ λ°°μΉ λ΄ μλ λΉκ΅κ° λ μ ν
|
|
|
4. **νμ²λ¦¬ κ·μΉ**: `priority_rules.yaml`μ μ΅μ
μ
λλ€. νμμμλ§ μ¬μ©
|
|
|
5. **λλ©μΈ μ μ**: μλ‘μ΄ λλ©μΈμμλ μ¬νμ΅ λλ νμΈνλ κΆμ₯
|
|
|
|
|
|
## π μμ
|
|
|
|
|
|
μ€μ μ¬μ© μμ λ [`examples/`](examples/) ν΄λλ₯Ό μ°Έκ³ νμΈμ.
|
|
|
|
|
|
- `input.json`: μ
λ ₯ μμ
|
|
|
- `output.json`: μΆλ ₯ μμ
|
|
|
|
|
|
## π κ΄λ ¨ μλ£
|
|
|
|
|
|
- **λ³ν ν¨μ**: [`postprocess/to_priority.py`](postprocess/to_priority.py) - μ μβν΄λμ€ λ³ν
|
|
|
- **νμ²λ¦¬ κ·μΉ (μ΅μ
)**: [`postprocess/priority_rules.yaml`](postprocess/priority_rules.yaml)
|
|
|
- **νμ²λ¦¬ μ€λͺ
**: [`postprocess/README.md`](postprocess/README.md)
|
|
|
|
|
|
## π λΌμ΄μΌμ€
|
|
|
|
|
|
- Apache 2.0
|
|
|
|
|
|
|