spinxxxx
/

git-issues-priority-ko

Text Classification

commit-priority

Model card Files Files and versions

git-issues-priority-ko / postprocess /README.md

spinxxxx's picture

feat: add issue priority prediction model (score-based)

902efd1 23 days ago

|

history blame contribute delete

3.8 kB

	# Priority 후처리 규칙

	모델의 원시 점수에 키워드 기반 규칙을 적용하여 우선순위를 조정하는 후처리 시스템입니다.

	## 개요

	이 모델은 회귀 모델로 점수를 예측하지만, 실제 사용 시에는 다음과 같은 후처리 규칙을 적용하는 것을 권장합니다:

	1. 키워드 기반 규칙 적용: 특정 키워드에 따라 우선순위를 강제 조정
	2. 배치 내 상대 정규화: 여러 이슈를 함께 비교할 때 배치 내에서 정규화
	3. 상대적 분류: 배치 내 상위/하위 퍼센타일 기준으로 HIGH/MED/LOW 분류

	## 규칙 종류

	### 1. LOW 강제 키워드
	`low_forced_keywords`에 포함된 키워드가 있으면 무조건 LOW 우선순위로 분류됩니다.

	예시:
	- "README 오타 수정" → LOW
	- "문서 업데이트" → LOW
	- "typo fix" → LOW

	### 2. 최소 MED 보장 키워드
	`min_med_keywords`에 포함된 키워드가 있으면 최소한 MED 이상의 우선순위를 보장합니다.

	예시:
	- "로그인 에러 발생" → 최소 MED
	- "서버 다운 문제" → 최소 MED
	- "결제 오류" → 최소 MED

	### 3. HIGH 부스트 키워드
	`high_boost_keywords`에 포함된 키워드가 있으면 HIGH 우선순위로 부스트됩니다.

	예시:
	- "데이터 손실 발생" → HIGH
	- "무한 루프 재발" → HIGH
	- "critical security issue" → HIGH

	## 사용법

	### Python 예제

	```python
	import yaml
	import json

	# 규칙 로드
	with open("postprocess/priority_rules.yaml", "r", encoding="utf-8") as f:
	rules = yaml.safe_load(f)

	# 이슈 텍스트
	issue_text = "로그인 에러 발생, 사용자 접근 불가"

	# 키워드 체크
	text_lower = issue_text.lower()

	# LOW 강제 체크
	if any(kw in text_lower for kw in rules["low_forced_keywords"]):
	priority = "LOW"
	elif any(kw in text_lower for kw in rules["high_boost_keywords"]):
	priority = "HIGH"
	elif any(kw in text_lower for kw in rules["min_med_keywords"]):
	# 모델 점수가 낮아도 최소 MED 보장
	priority = max(model_priority, "MED")
	else:
	priority = model_priority # 모델 예측 그대로 사용
	```

	### 배치 처리 예제

	```python
	import numpy as np
	from scipy.stats import rankdata

	def apply_postprocessing(issues, scores, rules):
	"""
	배치 내에서 후처리 규칙 적용
	"""
	# 1. 키워드 기반 규칙 적용
	adjusted_scores = apply_keyword_rules(issues, scores, rules)

	# 2. 정규화 (quantile)
	if rules["normalize_method"] == "quantile":
	normalized_scores = rankdata(adjusted_scores, method='average') / len(adjusted_scores)
	else:
	normalized_scores = adjusted_scores

	# 3. 상대적 분류
	q_high = np.percentile(normalized_scores, rules["high_percentile"] * 100)
	q_low = np.percentile(normalized_scores, rules["low_percentile"] * 100)

	priorities = []
	for score in normalized_scores:
	if score >= q_high:
	priorities.append("HIGH")
	elif score <= q_low:
	priorities.append("LOW")
	else:
	priorities.append("MED")

	return priorities, normalized_scores
	```

	## 규칙 커스터마이징

	`priority_rules.yaml` 파일을 수정하여 프로젝트에 맞는 키워드를 추가/제거할 수 있습니다.

	예시:
	```yaml
	# 프로젝트 특화 키워드 추가
	min_med_keywords:
	- 우리회사특화키워드
	- critical-path
	- production-issue
	```

	## 주의사항

	- 키워드 매칭은 대소문자를 구분하지 않습니다 (소문자로 변환 후 비교)
	- LOW 강제 키워드가 최우선으로 적용됩니다
	- HIGH 부스트 키워드가 있으면 자동으로 최소 MED도 보장됩니다
	- 배치 내 정규화는 여러 이슈를 함께 비교할 때만 의미가 있습니다