Korean Claim Detection Model for Fact-Checking
๋ชจ๋ธ ์๊ฐ (Model Description)
์ด ๋ชจ๋ธ์ ํ๊ตญ์ด ๋ฌธ์ฅ์์ ํฉํธ์ฒดํฌ๊ฐ ํ์ํ ์ฃผ์ฅ(claim)์ ์๋์ผ๋ก ํ์งํ๋ ์ด์ง ๋ถ๋ฅ ๋ชจ๋ธ์ ๋๋ค.
๋ด์ค ๊ธฐ์ฌ, ์ ์น ํ ๋ก , ์์ ๋ฏธ๋์ด ๊ฒ์๋ฌผ ๋ฑ์์ ๊ฒ์ฆ์ด ํ์ํ ์ฃผ์ฅ์ ์๋ณํ์ฌ, ํฉํธ์ฒดํน ์ํฌํ๋ก์ฐ์ ์ฒซ ๋จ๊ณ๋ฅผ ์๋ํํ ์ ์์ต๋๋ค.
This model automatically detects claims that require fact-checking in Korean sentences. It can identify verifiable claims in news articles, political debates, and social media posts, automating the first step of the fact-checking workflow.
- Base Model: beomi/KcELECTRA-base-v2022
- Task: Claim Detection (Checkworthy Sentence Classification)
- Language: Korean (ํ๊ตญ์ด)
- Labels:
0: ํฉํธ์ฒดํฌ๊ฐ ๋ถํ์ํ ๋ฌธ์ฅ (Non-checkworthy)1: ํฉํธ์ฒดํฌ๊ฐ ํ์ํ ์ฃผ์ฅ (Checkworthy claim)
๋ชจ๋ธ ๋ชฉํ (Model Objective)
์ ๋ ฅ๋ ํ๊ตญ์ด ๋ฌธ์ฅ์ ๋ถ์ํ์ฌ ๋ค์์ ํ๋จํฉ๋๋ค:
- ๊ฒ์ฆ ๊ฐ๋ฅํ ์ฌ์ค์ ์ฃผ์ฅ์ธ์ง
- ํฉํธ์ฒดํน์ด ํ์ํ ์ ๋๋ ์ผ๋ง๋ ๋๋์ง
This model analyzes Korean sentences to determine:
- Whether they contain verifiable factual claims
- The degree to which fact-checking is needed
ํฉํธ์ฒดํฌ๊ฐ ํ์ํ ์ฃผ์ฅ์ ์์ (Checkworthy Claim Examples)
โ Label 1 (Checkworthy):
- "์ฒญ๋ ์ค์ ๋ฅ ์ด ์ง๋ 3๋ ๊ฐ ๊ณ์ ์์นํ์ต๋๋ค"
- "์ฐ๋ฆฌ๋๋ผ GDP ์ฑ์ฅ๋ฅ ์ OECD ํ๊ท ์ ๋์ด์ฐ์ต๋๋ค"
- "์ด ์ ์ฑ ์ผ๋ก ์ผ์๋ฆฌ๊ฐ 100๋ง ๊ฐ ์ฐฝ์ถ๋ ๊ฒ์ ๋๋ค"
โ Label 0 (Non-checkworthy):
- "์ค๋ ํ ๋ก ํ๋ SBS ์์๋ ์คํ๋์ค์์ ์งํํ๊ณ ์๊ณ ์"
- "๊ตญ๋ฏผ ์ฌ๋ฌ๋ถ๊ป ๊ฐ์ฌ๋๋ฆฝ๋๋ค"
- "์ ์๊ฐ์๋ ์ด ์ ์ฑ ์ด ์ข์ ๊ฒ ๊ฐ์ต๋๋ค"
๋ฐ์ดํฐ์ (Dataset)
๋ฐ์ดํฐ ์ถ์ฒ
- Source: CLEF CheckThat! Lab 2024
- Task: Task 1 - Check-Worthiness Estimation
- Original Dataset: English political debates and speeches
- Translation: Machine-translated to Korean for training
๋ฐ์ดํฐ์ ํฌ๊ธฐ
- Training Set: 22,501 samples
- Validation Set: 1,032 samples
- Test Set: 318 samples
๋ฐ์ดํฐ ํน์ฑ
- ์ ์น ํ ๋ก , ์ฐ์ค๋ฌธ, ๋ด์ค ๊ธฐ์ฌ์์ ์ถ์ถ๋ ๋ฌธ์ฅ
- ์ ๋ฌธ ํฉํธ์ฒด์ปค๋ค์ด ๋ ์ด๋ธ๋งํ ๊ณ ํ์ง ๋ฐ์ดํฐ
- ํด๋์ค ๋ถ๊ท ํ: Label 0 (65%) vs Label 1 (35%)
ํ์ต ์ธ๋ถ์ฌํญ (Training Details)
ํ์ต ํ์ดํผํ๋ผ๋ฏธํฐ
- Epochs: 5
- Batch Size (Train): 32
- Batch Size (Eval): 64
- Learning Rate: 3e-05
- Weight Decay: 0.01
- Warmup Ratio: 0.1
- Precision: BF16
- Optimizer: adamw_torch_fused
- Max Sequence Length: 128 tokens
- Seed: 42
ํ์ต ํ๊ฒฝ
- GPU: NVIDIA GeForce RTX 4090 (24GB)
- Training Time: 1.87 minutes
- Framework: Hugging Face Transformers
- Early Stopping: Patience 3 (based on F1 score)
์ฑ๋ฅ (Performance)
Validation Metrics
- Accuracy: 97.58%
- F1 Score: 94.80%
- Precision: 93.83%
- Recall: 95.80%
Test Metrics
- Accuracy: 89.31%
- F1 Score: 82.65%
- Precision: 92.05%
- Recall: 75.00%
Confusion Matrix (Test Set)
Predicted
0 1
Actual 0 203 7 (96.7% ์ ํ๋)
1 27 81 (75.0% ์ฌํ์จ)
์ฑ๋ฅ ํด์:
- ๋์ Precision (92.05%): ๋ชจ๋ธ์ด "checkworthy"๋ผ๊ณ ์์ธกํ ๋ฌธ์ฅ์ 92%๊ฐ ์ค์ ๋ก ํฉํธ์ฒดํฌ๊ฐ ํ์
- ์ ์ ํ Recall (75.00%): ์ค์ checkworthy ๋ฌธ์ฅ์ 75%๋ฅผ ํ์ง
- ๋ฎ์ False Positive (7๊ฐ): ๋ถํ์ํ ํฉํธ์ฒดํฌ ์์ฒญ ์ต์ํ
์ฌ์ฉ ๋ฐฉ๋ฒ (How to Use)
1. ์ค์น (Installation)
pip install transformers torch
2. ๋ชจ๋ธ ๋ก๋ (Loading the Model)
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# ๋ชจ๋ธ ๋ก๋
model_name = "jonghhhh/claim_factcheck"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# GPU ์ฌ์ฉ (์ ํ์ฌํญ)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
print(f"โ
๋ชจ๋ธ ๋ก๋ ์๋ฃ! (Device: {device})")
3. ์ถ๋ก ์์ (Inference Example)
๋จ์ผ ๋ฌธ์ฅ ๋ถ๋ฅ
def predict_claim(text):
"""
์
๋ ฅ ๋ฌธ์ฅ์ด ํฉํธ์ฒดํฌ๊ฐ ํ์ํ ์ฃผ์ฅ์ธ์ง ํ๋จํฉ๋๋ค.
Args:
text (str): ๋ถ์ํ ํ๊ตญ์ด ๋ฌธ์ฅ
Returns:
dict: {
'text': ์
๋ ฅ ๋ฌธ์ฅ,
'is_checkworthy': True/False,
'confidence': 0.0~1.0 (ํ์ ๋),
'label': 0 ๋๋ 1,
'probabilities': {'non_checkworthy': 0.xx, 'checkworthy': 0.xx}
}
"""
# ํ ํฌ๋์ด์ง
inputs = tokenizer(
text,
truncation=True,
max_length=128,
return_tensors="pt"
)
inputs = {k: v.to(device) for k, v in inputs.items()}
# ์ถ๋ก
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predicted_label = torch.argmax(probs, dim=-1).item()
confidence = probs[0][predicted_label].item()
return {
'text': text,
'is_checkworthy': bool(predicted_label),
'confidence': confidence,
'label': predicted_label,
'probabilities': {
'non_checkworthy': probs[0][0].item(),
'checkworthy': probs[0][1].item()
}
}
# ์ฌ์ฉ ์์
examples = [
"์ค๋ ํ ๋ก ํ๋ SBS ์์๋ ์คํ๋์ค์์ ์งํํ๊ณ ์๊ณ ์.",
"์ฒญ๋
์ค์
๋ฅ ์ด ์ต๊ทผ 3๋
๊ฐ ๊ณ์ ์์นํ๊ณ ์์ต๋๋ค.",
"์ฐ๋ฆฌ๋๋ผ GDP ์ฑ์ฅ๋ฅ ์ OECD ํ๊ท ์ ๋์ด์ฐ์ต๋๋ค.",
"๊ตญ๋ฏผ ์ฌ๋ฌ๋ถ๊ป ์ง์ฌ์ผ๋ก ๊ฐ์ฌ๋๋ฆฝ๋๋ค."
]
for text in examples:
result = predict_claim(text)
print(f"\n๐ ์
๋ ฅ: {result['text']}")
print(f"{'๐ ํฉํธ์ฒดํฌ ํ์' if result['is_checkworthy'] else 'โ
ํฉํธ์ฒดํฌ ๋ถํ์'}")
print(f"ํ์ ๋: {result['confidence']*100:.1f}%")
print(f"์์ธ ํ๋ฅ : Non-CW {result['probabilities']['non_checkworthy']*100:.1f}% | CW {result['probabilities']['checkworthy']*100:.1f}%")
์ถ๋ ฅ ์์:
๐ ์
๋ ฅ: ์ฒญ๋
์ค์
๋ฅ ์ด ์ต๊ทผ 3๋
๊ฐ ๊ณ์ ์์นํ๊ณ ์์ต๋๋ค.
๐ ํฉํธ์ฒดํฌ ํ์
ํ์ ๋: 94.3%
์์ธ ํ๋ฅ : Non-CW 5.7% | CW 94.3%
๐ ์
๋ ฅ: ์ค๋ ํ ๋ก ํ๋ SBS ์์๋ ์คํ๋์ค์์ ์งํํ๊ณ ์๊ณ ์.
โ
ํฉํธ์ฒดํฌ ๋ถํ์
ํ์ ๋: 98.2%
์์ธ ํ๋ฅ : Non-CW 98.2% | CW 1.8%
๋ฐฐ์น ์ฒ๋ฆฌ (Batch Processing)
def predict_claims_batch(texts, batch_size=32):
"""
์ฌ๋ฌ ๋ฌธ์ฅ์ ๋ฐฐ์น๋ก ์ฒ๋ฆฌํฉ๋๋ค.
Args:
texts (list): ๋ฌธ์ฅ ๋ฆฌ์คํธ
batch_size (int): ๋ฐฐ์น ํฌ๊ธฐ
Returns:
list: ๊ฐ ๋ฌธ์ฅ์ ์์ธก ๊ฒฐ๊ณผ ๋ฆฌ์คํธ
"""
results = []
for i in range(0, len(texts), batch_size):
batch_texts = texts[i:i+batch_size]
# ๋ฐฐ์น ํ ํฌ๋์ด์ง
inputs = tokenizer(
batch_texts,
truncation=True,
max_length=128,
padding=True,
return_tensors="pt"
)
inputs = {k: v.to(device) for k, v in inputs.items()}
# ๋ฐฐ์น ์ถ๋ก
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predicted_labels = torch.argmax(probs, dim=-1).cpu().numpy()
# ๊ฒฐ๊ณผ ์ ์ฅ
for j, text in enumerate(batch_texts):
results.append({
'text': text,
'is_checkworthy': bool(predicted_labels[j]),
'confidence': probs[j][predicted_labels[j]].item(),
'label': int(predicted_labels[j])
})
return results
# ๋ฐฐ์น ์ถ๋ก ์์
texts = [
"๊ตญํ์์ ์ ์์ 300๋ช
์ผ๋ก ํ๋ํ๊ฒ ์ต๋๋ค.",
"๊ฐ์ฌํฉ๋๋ค.",
"2024๋
๊ฒฝ์ ์ฑ์ฅ๋ฅ ์ด 2.1%๋ฅผ ๊ธฐ๋กํ์ต๋๋ค.",
# ... ๋ ๋ง์ ๋ฌธ์ฅ๋ค
]
batch_results = predict_claims_batch(texts)
checkworthy_claims = [r for r in batch_results if r['is_checkworthy']]
print(f"โ
์ด {len(texts)}๊ฐ ๋ฌธ์ฅ ์ค {len(checkworthy_claims)}๊ฐ๊ฐ ํฉํธ์ฒดํฌ ํ์")
4. ์ค์ ํ์ฉ ์์ (Real-world Use Case)
# ๋ด์ค ๊ธฐ์ฌ์์ ํฉํธ์ฒดํฌ ๋์ ์ถ์ถ
def extract_checkworthy_claims(article_text, threshold=0.7):
"""
๊ธฐ์ฌ์์ ํฉํธ์ฒดํฌ๊ฐ ํ์ํ ๋ฌธ์ฅ๋ค์ ์ถ์ถํฉ๋๋ค.
Args:
article_text (str): ๋ด์ค ๊ธฐ์ฌ ์ ๋ฌธ
threshold (float): checkworthy ํ๋จ ์๊ณ๊ฐ (0.0~1.0)
Returns:
list: ํฉํธ์ฒดํฌ ๋์ ๋ฌธ์ฅ๋ค
"""
# ๋ฌธ์ฅ ๋ถ๋ฆฌ (๊ฐ๋จํ ์์)
sentences = [s.strip() for s in article_text.split('.') if s.strip()]
# ๋ฐฐ์น ์์ธก
results = predict_claims_batch(sentences)
# ์๊ณ๊ฐ ์ด์์ checkworthy ๋ฌธ์ฅ๋ง ํํฐ๋ง
checkworthy_claims = [
r for r in results
if r['is_checkworthy'] and r['confidence'] >= threshold
]
# ํ์ ๋ ์์ผ๋ก ์ ๋ ฌ
checkworthy_claims.sort(key=lambda x: x['confidence'], reverse=True)
return checkworthy_claims
# ์ฌ์ฉ ์์
article = """
์ ๋ถ๋ ์ค๋ ๊ฒฝ์ ์ ์ฑ
๋ฐฉํฅ์ ๋ฐํํ์ต๋๋ค.
์ฒญ๋
์ค์
๋ฅ ์ด ์ง๋ํด ๋๋น 2.3%p ๊ฐ์ํ๋ค๊ณ ๋ฐํ์ต๋๋ค.
์ด๋ ์ญ๋ ์ต๋ ํญ์ ํ๋ฝ์
๋๋ค.
์์ผ๋ก๋ ์ผ์๋ฆฌ ์ฐฝ์ถ์ ํ์ฐ๊ฒ ๋ค๊ณ ๊ฐ์กฐํ์ต๋๋ค.
"""
claims = extract_checkworthy_claims(article, threshold=0.8)
print(f"๐ ๋ฐ๊ฒฌ๋ ํฉํธ์ฒดํฌ ๋์: {len(claims)}๊ฐ\n")
for i, claim in enumerate(claims, 1):
print(f"{i}. {claim['text']}")
print(f" ํ์ ๋: {claim['confidence']*100:.1f}%\n")
๋ชจ๋ธ ์ํคํ ์ฒ (Model Architecture)
- Model Type: ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately)
- Hidden Size: 768
- Number of Layers: 12
- Number of Attention Heads: 12
- Vocabulary Size: 32,000
- Max Sequence Length: 128 tokens
- Classification Head: Linear layer (768 โ 2)
ํ๊ณ ๋ฐ ๊ณ ๋ ค์ฌํญ (Limitations)
- ๋๋ฉ์ธ ํนํ: ์ ์น/๋ด์ค ๋๋ฉ์ธ์ ์ต์ ํ๋์ด ์์ด, ์ผ์ ๋ํ๋ ๊ธฐ์ ๋ฌธ์์๋ ์ฑ๋ฅ์ด ๋จ์ด์ง ์ ์์
- ๊ธธ์ด ์ ํ: ์ต๋ 128 ํ ํฐ๊น์ง๋ง ์ฒ๋ฆฌ ๊ฐ๋ฅ (์ฝ 100-150 ๋จ์ด)
- ๊ธฐ๊ณ ๋ฒ์ญ ๋ฐ์ดํฐ: ์์ด์์ ๋ฒ์ญ๋ ๋ฐ์ดํฐ๋ก ํ์ต๋์ด ์์ฐ์ค๋ฌ์ด ํ๊ตญ์ด ํํ์์ ์ฑ๋ฅ ์ฐจ์ด ๊ฐ๋ฅ
- ์ด์ง ๋ถ๋ฅ: Checkworthy ์ ๋๋ฅผ 0/1๋ก๋ง ๋ถ๋ฅ (์ธ๋ฐํ ์ ์ ์ ๊ณต ์ ํจ)
- False Negative: ์ค์ ์ฃผ์ฅ์ 25%๋ฅผ ๋์น ์ ์์ (Recall 75%)
๊ฐ์ ๋ฐฉํฅ (Future Improvements)
- ํ๊ตญ์ด ๋ค์ดํฐ๋ธ ํฉํธ์ฒดํฌ ๋ฐ์ดํฐ์ ์ผ๋ก ์ถ๊ฐ ํ์ต
- ๋ ๊ธด ๋ฌธ๋งฅ ์ฒ๋ฆฌ๋ฅผ ์ํ ๋ชจ๋ธ ์ ๊ทธ๋ ์ด๋ (max_length 256+)
- ๋ค์ค ํด๋์ค ๋ถ๋ฅ (checkworthy ์ ์๋ฅผ 0-5 ์ฒ๋๋ก)
- ์ฃผ์ฅ์ ์ฃผ์ ์นดํ ๊ณ ๋ฆฌ ๋ถ๋ฅ ๊ธฐ๋ฅ ์ถ๊ฐ
๋ผ์ด์ ์ค (License)
์ด ๋ชจ๋ธ์ ๋ฒ ์ด์ค ๋ชจ๋ธ์ธ beomi/KcELECTRA-base-v2022์ ๋ผ์ด์ ์ค๋ฅผ ๋ฐ๋ฆ ๋๋ค.
์ธ์ฉ (Citation)
์ด ๋ชจ๋ธ์ ์ฐ๊ตฌ๋ ํ๋ก์ ํธ์ ์ฌ์ฉํ์ ๋ค๋ฉด ๋ค์๊ณผ ๊ฐ์ด ์ธ์ฉํด์ฃผ์ธ์:
@misc{korean-claim-factcheck-2025,
author = {Jonghhhh},
title = {Korean Claim Detection Model for Fact-Checking},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/jonghhhh/claim_factcheck}},
note = {Based on KcELECTRA-base-v2022}
}
์ฐธ๊ณ ์๋ฃ (References)
- Base Model: beomi/KcELECTRA-base-v2022
- Dataset: CLEF CheckThat! Lab 2024
- Paper: CheckThat! Lab: Check-Worthiness, Subjectivity, and Persuasion
์ฐ๋ฝ์ฒ (Contact)
์ง๋ฌธ์ด๋ ํผ๋๋ฐฑ์ด ์์ผ์๋ฉด Issues๋ฅผ ํตํด ๋จ๊ฒจ์ฃผ์ธ์!
Tags: claim-detection, fact-checking, korean, electra, text-classification, checkworthy, misinformation-detection
- Downloads last month
- 7