spinxxxx's picture
feat: add issue priority prediction model (score-based)
902efd1
# Priority ํ›„์ฒ˜๋ฆฌ ๊ทœ์น™
๋ชจ๋ธ์˜ ์›์‹œ ์ ์ˆ˜์— ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜ ๊ทœ์น™์„ ์ ์šฉํ•˜์—ฌ ์šฐ์„ ์ˆœ์œ„๋ฅผ ์กฐ์ •ํ•˜๋Š” ํ›„์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค.
## ๊ฐœ์š”
์ด ๋ชจ๋ธ์€ ํšŒ๊ท€ ๋ชจ๋ธ๋กœ ์ ์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜์ง€๋งŒ, ์‹ค์ œ ์‚ฌ์šฉ ์‹œ์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ›„์ฒ˜๋ฆฌ ๊ทœ์น™์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค:
1. **ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜ ๊ทœ์น™ ์ ์šฉ**: ํŠน์ • ํ‚ค์›Œ๋“œ์— ๋”ฐ๋ผ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฐ•์ œ ์กฐ์ •
2. **๋ฐฐ์น˜ ๋‚ด ์ƒ๋Œ€ ์ •๊ทœํ™”**: ์—ฌ๋Ÿฌ ์ด์Šˆ๋ฅผ ํ•จ๊ป˜ ๋น„๊ตํ•  ๋•Œ ๋ฐฐ์น˜ ๋‚ด์—์„œ ์ •๊ทœํ™”
3. **์ƒ๋Œ€์  ๋ถ„๋ฅ˜**: ๋ฐฐ์น˜ ๋‚ด ์ƒ์œ„/ํ•˜์œ„ ํผ์„ผํƒ€์ผ ๊ธฐ์ค€์œผ๋กœ HIGH/MED/LOW ๋ถ„๋ฅ˜
## ๊ทœ์น™ ์ข…๋ฅ˜
### 1. LOW ๊ฐ•์ œ ํ‚ค์›Œ๋“œ
`low_forced_keywords`์— ํฌํ•จ๋œ ํ‚ค์›Œ๋“œ๊ฐ€ ์žˆ์œผ๋ฉด ๋ฌด์กฐ๊ฑด LOW ์šฐ์„ ์ˆœ์œ„๋กœ ๋ถ„๋ฅ˜๋ฉ๋‹ˆ๋‹ค.
์˜ˆ์‹œ:
- "README ์˜คํƒ€ ์ˆ˜์ •" โ†’ LOW
- "๋ฌธ์„œ ์—…๋ฐ์ดํŠธ" โ†’ LOW
- "typo fix" โ†’ LOW
### 2. ์ตœ์†Œ MED ๋ณด์žฅ ํ‚ค์›Œ๋“œ
`min_med_keywords`์— ํฌํ•จ๋œ ํ‚ค์›Œ๋“œ๊ฐ€ ์žˆ์œผ๋ฉด ์ตœ์†Œํ•œ MED ์ด์ƒ์˜ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.
์˜ˆ์‹œ:
- "๋กœ๊ทธ์ธ ์—๋Ÿฌ ๋ฐœ์ƒ" โ†’ ์ตœ์†Œ MED
- "์„œ๋ฒ„ ๋‹ค์šด ๋ฌธ์ œ" โ†’ ์ตœ์†Œ MED
- "๊ฒฐ์ œ ์˜ค๋ฅ˜" โ†’ ์ตœ์†Œ MED
### 3. HIGH ๋ถ€์ŠคํŠธ ํ‚ค์›Œ๋“œ
`high_boost_keywords`์— ํฌํ•จ๋œ ํ‚ค์›Œ๋“œ๊ฐ€ ์žˆ์œผ๋ฉด HIGH ์šฐ์„ ์ˆœ์œ„๋กœ ๋ถ€์ŠคํŠธ๋ฉ๋‹ˆ๋‹ค.
์˜ˆ์‹œ:
- "๋ฐ์ดํ„ฐ ์†์‹ค ๋ฐœ์ƒ" โ†’ HIGH
- "๋ฌดํ•œ ๋ฃจํ”„ ์žฌ๋ฐœ" โ†’ HIGH
- "critical security issue" โ†’ HIGH
## ์‚ฌ์šฉ๋ฒ•
### Python ์˜ˆ์ œ
```python
import yaml
import json
# ๊ทœ์น™ ๋กœ๋“œ
with open("postprocess/priority_rules.yaml", "r", encoding="utf-8") as f:
rules = yaml.safe_load(f)
# ์ด์Šˆ ํ…์ŠคํŠธ
issue_text = "๋กœ๊ทธ์ธ ์—๋Ÿฌ ๋ฐœ์ƒ, ์‚ฌ์šฉ์ž ์ ‘๊ทผ ๋ถˆ๊ฐ€"
# ํ‚ค์›Œ๋“œ ์ฒดํฌ
text_lower = issue_text.lower()
# LOW ๊ฐ•์ œ ์ฒดํฌ
if any(kw in text_lower for kw in rules["low_forced_keywords"]):
priority = "LOW"
elif any(kw in text_lower for kw in rules["high_boost_keywords"]):
priority = "HIGH"
elif any(kw in text_lower for kw in rules["min_med_keywords"]):
# ๋ชจ๋ธ ์ ์ˆ˜๊ฐ€ ๋‚ฎ์•„๋„ ์ตœ์†Œ MED ๋ณด์žฅ
priority = max(model_priority, "MED")
else:
priority = model_priority # ๋ชจ๋ธ ์˜ˆ์ธก ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ
```
### ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ ์˜ˆ์ œ
```python
import numpy as np
from scipy.stats import rankdata
def apply_postprocessing(issues, scores, rules):
"""
๋ฐฐ์น˜ ๋‚ด์—์„œ ํ›„์ฒ˜๋ฆฌ ๊ทœ์น™ ์ ์šฉ
"""
# 1. ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜ ๊ทœ์น™ ์ ์šฉ
adjusted_scores = apply_keyword_rules(issues, scores, rules)
# 2. ์ •๊ทœํ™” (quantile)
if rules["normalize_method"] == "quantile":
normalized_scores = rankdata(adjusted_scores, method='average') / len(adjusted_scores)
else:
normalized_scores = adjusted_scores
# 3. ์ƒ๋Œ€์  ๋ถ„๋ฅ˜
q_high = np.percentile(normalized_scores, rules["high_percentile"] * 100)
q_low = np.percentile(normalized_scores, rules["low_percentile"] * 100)
priorities = []
for score in normalized_scores:
if score >= q_high:
priorities.append("HIGH")
elif score <= q_low:
priorities.append("LOW")
else:
priorities.append("MED")
return priorities, normalized_scores
```
## ๊ทœ์น™ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•
`priority_rules.yaml` ํŒŒ์ผ์„ ์ˆ˜์ •ํ•˜์—ฌ ํ”„๋กœ์ ํŠธ์— ๋งž๋Š” ํ‚ค์›Œ๋“œ๋ฅผ ์ถ”๊ฐ€/์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์˜ˆ์‹œ:
```yaml
# ํ”„๋กœ์ ํŠธ ํŠนํ™” ํ‚ค์›Œ๋“œ ์ถ”๊ฐ€
min_med_keywords:
- ์šฐ๋ฆฌํšŒ์‚ฌํŠนํ™”ํ‚ค์›Œ๋“œ
- critical-path
- production-issue
```
## ์ฃผ์˜์‚ฌํ•ญ
- ํ‚ค์›Œ๋“œ ๋งค์นญ์€ ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค (์†Œ๋ฌธ์ž๋กœ ๋ณ€ํ™˜ ํ›„ ๋น„๊ต)
- LOW ๊ฐ•์ œ ํ‚ค์›Œ๋“œ๊ฐ€ ์ตœ์šฐ์„ ์œผ๋กœ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค
- HIGH ๋ถ€์ŠคํŠธ ํ‚ค์›Œ๋“œ๊ฐ€ ์žˆ์œผ๋ฉด ์ž๋™์œผ๋กœ ์ตœ์†Œ MED๋„ ๋ณด์žฅ๋ฉ๋‹ˆ๋‹ค
- ๋ฐฐ์น˜ ๋‚ด ์ •๊ทœํ™”๋Š” ์—ฌ๋Ÿฌ ์ด์Šˆ๋ฅผ ํ•จ๊ป˜ ๋น„๊ตํ•  ๋•Œ๋งŒ ์˜๋ฏธ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค