spinxxxx's picture
feat: add issue priority prediction model (score-based)
902efd1

Priority ํ›„์ฒ˜๋ฆฌ ๊ทœ์น™

๋ชจ๋ธ์˜ ์›์‹œ ์ ์ˆ˜์— ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜ ๊ทœ์น™์„ ์ ์šฉํ•˜์—ฌ ์šฐ์„ ์ˆœ์œ„๋ฅผ ์กฐ์ •ํ•˜๋Š” ํ›„์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค.

๊ฐœ์š”

์ด ๋ชจ๋ธ์€ ํšŒ๊ท€ ๋ชจ๋ธ๋กœ ์ ์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜์ง€๋งŒ, ์‹ค์ œ ์‚ฌ์šฉ ์‹œ์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ›„์ฒ˜๋ฆฌ ๊ทœ์น™์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค:

  1. ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜ ๊ทœ์น™ ์ ์šฉ: ํŠน์ • ํ‚ค์›Œ๋“œ์— ๋”ฐ๋ผ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฐ•์ œ ์กฐ์ •
  2. ๋ฐฐ์น˜ ๋‚ด ์ƒ๋Œ€ ์ •๊ทœํ™”: ์—ฌ๋Ÿฌ ์ด์Šˆ๋ฅผ ํ•จ๊ป˜ ๋น„๊ตํ•  ๋•Œ ๋ฐฐ์น˜ ๋‚ด์—์„œ ์ •๊ทœํ™”
  3. ์ƒ๋Œ€์  ๋ถ„๋ฅ˜: ๋ฐฐ์น˜ ๋‚ด ์ƒ์œ„/ํ•˜์œ„ ํผ์„ผํƒ€์ผ ๊ธฐ์ค€์œผ๋กœ HIGH/MED/LOW ๋ถ„๋ฅ˜

๊ทœ์น™ ์ข…๋ฅ˜

1. LOW ๊ฐ•์ œ ํ‚ค์›Œ๋“œ

low_forced_keywords์— ํฌํ•จ๋œ ํ‚ค์›Œ๋“œ๊ฐ€ ์žˆ์œผ๋ฉด ๋ฌด์กฐ๊ฑด LOW ์šฐ์„ ์ˆœ์œ„๋กœ ๋ถ„๋ฅ˜๋ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ:

  • "README ์˜คํƒ€ ์ˆ˜์ •" โ†’ LOW
  • "๋ฌธ์„œ ์—…๋ฐ์ดํŠธ" โ†’ LOW
  • "typo fix" โ†’ LOW

2. ์ตœ์†Œ MED ๋ณด์žฅ ํ‚ค์›Œ๋“œ

min_med_keywords์— ํฌํ•จ๋œ ํ‚ค์›Œ๋“œ๊ฐ€ ์žˆ์œผ๋ฉด ์ตœ์†Œํ•œ MED ์ด์ƒ์˜ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ:

  • "๋กœ๊ทธ์ธ ์—๋Ÿฌ ๋ฐœ์ƒ" โ†’ ์ตœ์†Œ MED
  • "์„œ๋ฒ„ ๋‹ค์šด ๋ฌธ์ œ" โ†’ ์ตœ์†Œ MED
  • "๊ฒฐ์ œ ์˜ค๋ฅ˜" โ†’ ์ตœ์†Œ MED

3. HIGH ๋ถ€์ŠคํŠธ ํ‚ค์›Œ๋“œ

high_boost_keywords์— ํฌํ•จ๋œ ํ‚ค์›Œ๋“œ๊ฐ€ ์žˆ์œผ๋ฉด HIGH ์šฐ์„ ์ˆœ์œ„๋กœ ๋ถ€์ŠคํŠธ๋ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ:

  • "๋ฐ์ดํ„ฐ ์†์‹ค ๋ฐœ์ƒ" โ†’ HIGH
  • "๋ฌดํ•œ ๋ฃจํ”„ ์žฌ๋ฐœ" โ†’ HIGH
  • "critical security issue" โ†’ HIGH

์‚ฌ์šฉ๋ฒ•

Python ์˜ˆ์ œ

import yaml
import json

# ๊ทœ์น™ ๋กœ๋“œ
with open("postprocess/priority_rules.yaml", "r", encoding="utf-8") as f:
    rules = yaml.safe_load(f)

# ์ด์Šˆ ํ…์ŠคํŠธ
issue_text = "๋กœ๊ทธ์ธ ์—๋Ÿฌ ๋ฐœ์ƒ, ์‚ฌ์šฉ์ž ์ ‘๊ทผ ๋ถˆ๊ฐ€"

# ํ‚ค์›Œ๋“œ ์ฒดํฌ
text_lower = issue_text.lower()

# LOW ๊ฐ•์ œ ์ฒดํฌ
if any(kw in text_lower for kw in rules["low_forced_keywords"]):
    priority = "LOW"
elif any(kw in text_lower for kw in rules["high_boost_keywords"]):
    priority = "HIGH"
elif any(kw in text_lower for kw in rules["min_med_keywords"]):
    # ๋ชจ๋ธ ์ ์ˆ˜๊ฐ€ ๋‚ฎ์•„๋„ ์ตœ์†Œ MED ๋ณด์žฅ
    priority = max(model_priority, "MED")
else:
    priority = model_priority  # ๋ชจ๋ธ ์˜ˆ์ธก ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ

๋ฐฐ์น˜ ์ฒ˜๋ฆฌ ์˜ˆ์ œ

import numpy as np
from scipy.stats import rankdata

def apply_postprocessing(issues, scores, rules):
    """
    ๋ฐฐ์น˜ ๋‚ด์—์„œ ํ›„์ฒ˜๋ฆฌ ๊ทœ์น™ ์ ์šฉ
    """
    # 1. ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜ ๊ทœ์น™ ์ ์šฉ
    adjusted_scores = apply_keyword_rules(issues, scores, rules)
    
    # 2. ์ •๊ทœํ™” (quantile)
    if rules["normalize_method"] == "quantile":
        normalized_scores = rankdata(adjusted_scores, method='average') / len(adjusted_scores)
    else:
        normalized_scores = adjusted_scores
    
    # 3. ์ƒ๋Œ€์  ๋ถ„๋ฅ˜
    q_high = np.percentile(normalized_scores, rules["high_percentile"] * 100)
    q_low = np.percentile(normalized_scores, rules["low_percentile"] * 100)
    
    priorities = []
    for score in normalized_scores:
        if score >= q_high:
            priorities.append("HIGH")
        elif score <= q_low:
            priorities.append("LOW")
        else:
            priorities.append("MED")
    
    return priorities, normalized_scores

๊ทœ์น™ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•

priority_rules.yaml ํŒŒ์ผ์„ ์ˆ˜์ •ํ•˜์—ฌ ํ”„๋กœ์ ํŠธ์— ๋งž๋Š” ํ‚ค์›Œ๋“œ๋ฅผ ์ถ”๊ฐ€/์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ:

# ํ”„๋กœ์ ํŠธ ํŠนํ™” ํ‚ค์›Œ๋“œ ์ถ”๊ฐ€
min_med_keywords:
  - ์šฐ๋ฆฌํšŒ์‚ฌํŠนํ™”ํ‚ค์›Œ๋“œ
  - critical-path
  - production-issue

์ฃผ์˜์‚ฌํ•ญ

  • ํ‚ค์›Œ๋“œ ๋งค์นญ์€ ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค (์†Œ๋ฌธ์ž๋กœ ๋ณ€ํ™˜ ํ›„ ๋น„๊ต)
  • LOW ๊ฐ•์ œ ํ‚ค์›Œ๋“œ๊ฐ€ ์ตœ์šฐ์„ ์œผ๋กœ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค
  • HIGH ๋ถ€์ŠคํŠธ ํ‚ค์›Œ๋“œ๊ฐ€ ์žˆ์œผ๋ฉด ์ž๋™์œผ๋กœ ์ตœ์†Œ MED๋„ ๋ณด์žฅ๋ฉ๋‹ˆ๋‹ค
  • ๋ฐฐ์น˜ ๋‚ด ์ •๊ทœํ™”๋Š” ์—ฌ๋Ÿฌ ์ด์Šˆ๋ฅผ ํ•จ๊ป˜ ๋น„๊ตํ•  ๋•Œ๋งŒ ์˜๋ฏธ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค