Issue Priority Predictor (Korean)

컀밋/이슈의 μš°μ„ μˆœμœ„λ₯Ό μžλ™μœΌλ‘œ μ˜ˆμΈ‘ν•˜λŠ” ν•œκ΅­μ–΄/μ˜μ–΄ 지원 λͺ¨λΈ

Model Details

이 λͺ¨λΈμ€ GitHub 컀밋 ν…μŠ€νŠΈλ₯Ό 기반으둜 μš°μ„ μˆœμœ„ 점수(priority score)λ₯Ό μ˜ˆμΈ‘ν•˜λŠ” λ‹€κ΅­μ–΄ λͺ¨λΈμž…λ‹ˆλ‹€.

distilbert-base-multilingual-casedλ₯Ό 기반으둜 ν•˜μ—¬, ν•œκ΅­μ–΄μ™€ μ˜μ–΄λ‘œ μž‘μ„±λœ 컀밋 데이터λ₯Ό μ‚¬μš©ν•΄ νŒŒμΈνŠœλ‹λ˜μ—ˆμŠ΅λ‹ˆλ‹€.

λͺ¨λΈμ€ μž…λ ₯ ν…μŠ€νŠΈμ— λŒ€ν•΄ 0~1 λ²”μœ„μ˜ 연속적인 점수λ₯Ό 좜λ ₯ν•˜λ©°, μ μˆ˜κ°€ λ†’μ„μˆ˜λ‘ μƒλŒ€μ μœΌλ‘œ μš°μ„ μˆœμœ„κ°€ λ†’μŒμ„ μ˜λ―Έν•©λ‹ˆλ‹€. μ΅œμ’…μ μΈ μš°μ„ μˆœμœ„ 클래슀(HIGH / MED / LOW)λŠ” μ„œλΉ„μŠ€ ν™˜κ²½μ— λ§žλŠ” ν›„μ²˜λ¦¬ 정책을 톡해 κ²°μ •ν•˜λŠ” 것을 μ „μ œλ‘œ ν•©λ‹ˆλ‹€.

Evaluation Metrics

μ•„λž˜ ν‰κ°€μ§€ν‘œλŠ” 0~1둜 μŠ€μΌ€μΌλ§λœ μš°μ„ μˆœμœ„ 점수λ₯Ό κΈ°μ€€μœΌλ‘œ μ‚°μΆœλ˜μ—ˆμŠ΅λ‹ˆλ‹€.

Loss: 0.0045

MAE (평균 μ ˆλŒ€ 였차): 0.0122

RMSE (평균 제곱근 였차): 0.0150

Spearman μƒκ΄€κ³„μˆ˜: 0.8473

Note λ³Έ λͺ¨λΈμ€ μš°μ„ μˆœμœ„λ₯Ό 직접 λΆ„λ₯˜(classification)ν•˜μ§€ μ•Šκ³ , λͺ¨λΈμ΄ μ˜ˆμΈ‘ν•œ 점수λ₯Ό 기반으둜 도메인 μ •μ±…(λ³΄μ•ˆ, 결제, μž₯μ• , λ¬Έμ„œ λ³€κ²½ λ“±)을 λ°˜μ˜ν•œ ν›„μ²˜λ¦¬λ₯Ό μ μš©ν•˜λ„λ‘ μ„€κ³„λ˜μ—ˆμŠ΅λ‹ˆλ‹€.

πŸš€ λΉ λ₯Έ μ‹œμž‘

λͺ¨λΈ 예츑 (점수만 좜λ ₯)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import json

# λͺ¨λΈ λ‘œλ“œ
model_name = "your-username/issue-priority-ko"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

# 예츑 (점수만 좜λ ₯)
text = "둜그인 μ•ˆλ¨, 토큰 만료 처리 ν•„μš”"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    score_raw = model(**inputs).logits.item()  # 0~1 λ²”μœ„ 점수

# μ›λž˜ μŠ€μΌ€μΌλ‘œ 볡원
with open("score_thresholds.json", "r", encoding="utf-8") as f:
    thresholds = json.load(f)

score = score_raw * (thresholds["train_max"] - thresholds["train_min"]) + thresholds["train_min"]

print(f"Predicted Score: {score:.4f}")

점수 β†’ 클래슀 λ³€ν™˜ (ν›„μ²˜λ¦¬)

# 방법 1: to_priority ν•¨μˆ˜ μ‚¬μš© (ꢌμž₯)
from postprocess.to_priority import to_priority

# κΈ°λ³Έ λ³€ν™˜ (ν›„μ²˜λ¦¬ κ·œμΉ™ μ—†μŒ)
priority = to_priority(score=score, text=text)
print(f"Priority: {priority}")

# ν›„μ²˜λ¦¬ κ·œμΉ™ 포함 (μ˜΅μ…˜)
priority = to_priority(score=score, text=text, use_rules=True)
print(f"Priority (with rules): {priority}")
# 방법 2: 직접 λ³€ν™˜
if score >= thresholds["q_high"]:
    priority = "HIGH"
elif score <= thresholds["q_low"]:
    priority = "LOW"
else:
    priority = "MED"

πŸ“‹ λͺ¨λΈ 정보

ν•­λͺ© λ‚΄μš©
기반 λͺ¨λΈ distilbert-base-multilingual-cased
μž‘μ—… μœ ν˜• νšŒκ·€ (Regression)
μž…λ ₯ 컀밋/이슈 제λͺ© + λ³Έλ¬Έ ν…μŠ€νŠΈ
좜λ ₯ μš°μ„ μˆœμœ„ 점수 (float)
클래슀 λ³€ν™˜ ν›„μ²˜λ¦¬λ‘œ μˆ˜ν–‰ (to_priority() ν•¨μˆ˜)
μ–Έμ–΄ ν•œκ΅­μ–΄, μ˜μ–΄
μ΅œλŒ€ 길이 256 토큰

μ€‘μš”: λͺ¨λΈμ€ 점수만 좜λ ₯ν•©λ‹ˆλ‹€. HIGH/MED/LOW 클래슀 λ³€ν™˜μ€ to_priority() ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜μ„Έμš”.

🎯 μ£Όμš” νŠΉμ§•

  1. λ‹€κ΅­μ–΄ 지원: ν•œκ΅­μ–΄μ™€ μ˜μ–΄ 컀밋/이슈 λͺ¨λ‘ 처리 κ°€λŠ₯
  2. ν‚€μ›Œλ“œ 기반 ν›„μ²˜λ¦¬: postprocess/priority_rules.yaml둜 κ·œμΉ™ μ»€μŠ€ν„°λ§ˆμ΄μ§•
  3. 배치 λ‚΄ μƒλŒ€ μ •λ ¬: μ—¬λŸ¬ 이슈λ₯Ό ν•¨κ»˜ λΉ„κ΅ν•˜μ—¬ 더 μ •ν™•ν•œ μš°μ„ μˆœμœ„ 예츑
  4. κ²½λŸ‰ λͺ¨λΈ: DistilBERT 기반으둜 λΉ λ₯Έ μΆ”λ‘  속도

πŸ“ 폴더 ꡬ쑰

issue-priority-ko/
β”œβ”€β”€ README.md                # 이 파일
β”œβ”€β”€ config.json              # λͺ¨λΈ μ„€μ •
β”œβ”€β”€ model.safetensors        # λͺ¨λΈ κ°€μ€‘μΉ˜
β”œβ”€β”€ tokenizer.json           # ν† ν¬λ‚˜μ΄μ €
β”œβ”€β”€ tokenizer_config.json
β”œβ”€β”€ vocab.txt
β”œβ”€β”€ score_thresholds.json    # μš°μ„ μˆœμœ„ λ³€ν™˜ μž„κ³„κ°’
β”‚
β”œβ”€β”€ postprocess/             # ν›„μ²˜λ¦¬ κ·œμΉ™ (μ˜΅μ…˜)
β”‚   β”œβ”€β”€ to_priority.py        # μ μˆ˜β†’ν΄λž˜μŠ€ λ³€ν™˜ ν•¨μˆ˜
β”‚   β”œβ”€β”€ priority_rules.yaml  # ν‚€μ›Œλ“œ 기반 κ·œμΉ™ (μ˜΅μ…˜)
β”‚   └── README.md            # ν›„μ²˜λ¦¬ μ„€λͺ…
β”‚
β”œβ”€β”€ examples/                # μ‚¬μš© 예제
β”‚   β”œβ”€β”€ input.json
β”‚   └── output.json
β”‚
└── requirements.txt         # μ˜μ‘΄μ„± νŒ¨ν‚€μ§€

πŸ”„ 점수 β†’ 클래슀 λ³€ν™˜

to_priority() ν•¨μˆ˜ μ‚¬μš©

from postprocess.to_priority import to_priority

# κΈ°λ³Έ λ³€ν™˜ (threshold 기반)
priority = to_priority(score=0.82, text="둜그인 μ—λŸ¬ λ°œμƒ")

# ν›„μ²˜λ¦¬ κ·œμΉ™ 포함 (μ˜΅μ…˜)
priority = to_priority(score=0.82, text="둜그인 μ—λŸ¬ λ°œμƒ", use_rules=True)

# 배치 λ³€ν™˜
from postprocess.to_priority import to_priority_batch
scores = [0.82, 0.75, 0.90]
texts = ["둜그인 μ—λŸ¬", "README μˆ˜μ •", "μ„œλ²„ λ‹€μš΄"]
priorities = to_priority_batch(scores, texts, use_rules=True)

ν›„μ²˜λ¦¬ κ·œμΉ™ (μ˜΅μ…˜)

postprocess/priority_rules.yaml을 μ‚¬μš©ν•˜μ—¬ ν‚€μ›Œλ“œ 기반 κ·œμΉ™μ„ μ μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

κ·œμΉ™ μ˜ˆμ‹œ:

  • LOW κ°•μ œ: readme, typo, λ¬Έμ„œ β†’ 무쑰건 LOW
  • μ΅œμ†Œ MED 보μž₯: μž₯μ• , μ—λŸ¬, 둜그인, 결제 β†’ μ΅œμ†Œ MED
  • HIGH λΆ€μŠ€νŠΈ: 데이터 손싀, λ¬΄ν•œ, critical β†’ HIGH

μžμ„Έν•œ λ‚΄μš©μ€ postprocess/README.mdλ₯Ό μ°Έκ³ ν•˜μ„Έμš”.

πŸ“Š μ„±λŠ₯ μ§€ν‘œ

μ§€ν‘œ κ°’
MAE 0.009 (μŠ€μΌ€μΌλœ κ°’ κΈ°μ€€)
RMSE 0.015 (μŠ€μΌ€μΌλœ κ°’ κΈ°μ€€)
Spearman Correlation 0.85

μ°Έκ³ : λͺ¨λΈμ€ μƒλŒ€μ  μˆœμœ„ μ˜ˆμΈ‘μ— 더 μ ν•©ν•©λ‹ˆλ‹€. μ ˆλŒ€ μ μˆ˜λ³΄λ‹€λŠ” 배치 λ‚΄ 비ꡐλ₯Ό ꢌμž₯ν•©λ‹ˆλ‹€.

πŸ’‘ μ‚¬μš© 팁

1. 단일 예츑

# λͺ¨λΈ 예츑
text = "둜그인 μ•ˆλ¨"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    score_raw = model(**inputs).logits.item()

# μŠ€μΌ€μΌ 볡원
score = score_raw * (thresholds["train_max"] - thresholds["train_min"]) + thresholds["train_min"]

# 클래슀 λ³€ν™˜
from postprocess.to_priority import to_priority
priority = to_priority(score=score, text=text, use_rules=True)

2. 배치 예츑 (ꢌμž₯)

texts = ["이슈1", "이슈2", "이슈3"]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, max_length=256, padding=True)

with torch.no_grad():
    scores_raw = model(**inputs).logits.squeeze(-1).numpy()

# μŠ€μΌ€μΌ 볡원
scores = scores_raw * (train_max - train_min) + train_min

# 배치 λ‚΄ μƒλŒ€ μ •λ ¬ (quantile 기반)
from scipy.stats import rankdata
normalized = rankdata(scores, method='average') / len(scores)

# μƒμœ„ 30% = HIGH, ν•˜μœ„ 30% = LOW
q_high = np.percentile(normalized, 70)
q_low = np.percentile(normalized, 30)

3. 배치 예츑 + 클래슀 λ³€ν™˜

# 배치 예츑
texts = ["이슈1", "이슈2", "이슈3"]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, max_length=256, padding=True)

with torch.no_grad():
    scores_raw = model(**inputs).logits.squeeze(-1).numpy()

# μŠ€μΌ€μΌ 볡원
scores = scores_raw * (thresholds["train_max"] - thresholds["train_min"]) + thresholds["train_min"]

# 배치 클래슀 λ³€ν™˜
from postprocess.to_priority import to_priority_batch
priorities = to_priority_batch(scores, texts, use_rules=True)

for text, score, priority in zip(texts, scores, priorities):
    print(f"{priority}: {score:.4f} - {text}")

⚠️ μ£Όμ˜μ‚¬ν•­

  1. λͺ¨λΈ 좜λ ₯: λͺ¨λΈμ€ 점수만 좜λ ₯ν•©λ‹ˆλ‹€ (νšŒκ·€ λͺ¨λΈ). 클래슀 λ³€ν™˜μ€ to_priority() ν•¨μˆ˜ μ‚¬μš©
  2. μŠ€μΌ€μΌ 볡원 ν•„μˆ˜: λͺ¨λΈ 좜λ ₯은 0~1 λ²”μœ„μž…λ‹ˆλ‹€. score_thresholds.json으둜 μ›λž˜ μŠ€μΌ€μΌ 볡원 ν•„μš”
  3. μƒλŒ€μ  μˆœμœ„: μ ˆλŒ€ μ μˆ˜λ³΄λ‹€λŠ” 배치 λ‚΄ μƒλŒ€ 비ꡐ가 더 μ •ν™•
  4. ν›„μ²˜λ¦¬ κ·œμΉ™: priority_rules.yaml은 μ˜΅μ…˜μž…λ‹ˆλ‹€. ν•„μš”μ‹œμ—λ§Œ μ‚¬μš©
  5. 도메인 적응: μƒˆλ‘œμš΄ λ„λ©”μΈμ—μ„œλŠ” μž¬ν•™μŠ΅ λ˜λŠ” νŒŒμΈνŠœλ‹ ꢌμž₯

πŸ“š 예제

μ‹€μ œ μ‚¬μš© μ˜ˆμ œλŠ” examples/ 폴더λ₯Ό μ°Έκ³ ν•˜μ„Έμš”.

  • input.json: μž…λ ₯ 예제
  • output.json: 좜λ ₯ 예제

πŸ”— κ΄€λ ¨ 자료

πŸ“„ λΌμ΄μ„ΌμŠ€

  • Apache 2.0
Downloads last month
100
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support