claim_factcheck / README.md
jonghhhh's picture
Upload Korean Claim Detection Model for Fact-Checking
7553492 verified
# Korean Claim Detection Model for Fact-Checking
## ๋ชจ๋ธ ์†Œ๊ฐœ (Model Description)
์ด ๋ชจ๋ธ์€ **ํ•œ๊ตญ์–ด ๋ฌธ์žฅ์—์„œ ํŒฉํŠธ์ฒดํฌ๊ฐ€ ํ•„์š”ํ•œ ์ฃผ์žฅ(claim)์„ ์ž๋™์œผ๋กœ ํƒ์ง€**ํ•˜๋Š” ์ด์ง„ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
๋‰ด์Šค ๊ธฐ์‚ฌ, ์ •์น˜ ํ† ๋ก , ์†Œ์…œ ๋ฏธ๋””์–ด ๊ฒŒ์‹œ๋ฌผ ๋“ฑ์—์„œ ๊ฒ€์ฆ์ด ํ•„์š”ํ•œ ์ฃผ์žฅ์„ ์‹๋ณ„ํ•˜์—ฌ, ํŒฉํŠธ์ฒดํ‚น ์›Œํฌํ”Œ๋กœ์šฐ์˜ ์ฒซ ๋‹จ๊ณ„๋ฅผ ์ž๋™ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
This model **automatically detects claims that require fact-checking** in Korean sentences. It can identify verifiable claims in news articles, political debates, and social media posts, automating the first step of the fact-checking workflow.
- **Base Model**: [beomi/KcELECTRA-base-v2022](https://huggingface.co/beomi/KcELECTRA-base-v2022)
- **Task**: Claim Detection (Checkworthy Sentence Classification)
- **Language**: Korean (ํ•œ๊ตญ์–ด)
- **Labels**:
- `0`: ํŒฉํŠธ์ฒดํฌ๊ฐ€ ๋ถˆํ•„์š”ํ•œ ๋ฌธ์žฅ (Non-checkworthy)
- `1`: ํŒฉํŠธ์ฒดํฌ๊ฐ€ ํ•„์š”ํ•œ ์ฃผ์žฅ (Checkworthy claim)
## ๋ชจ๋ธ ๋ชฉํ‘œ (Model Objective)
์ž…๋ ฅ๋œ ํ•œ๊ตญ์–ด ๋ฌธ์žฅ์„ ๋ถ„์„ํ•˜์—ฌ ๋‹ค์Œ์„ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค:
- **๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ์‚ฌ์‹ค์  ์ฃผ์žฅ**์ธ์ง€
- **ํŒฉํŠธ์ฒดํ‚น์ด ํ•„์š”ํ•œ ์ •๋„**๋Š” ์–ผ๋งˆ๋‚˜ ๋˜๋Š”์ง€
This model analyzes Korean sentences to determine:
- Whether they contain **verifiable factual claims**
- The **degree to which fact-checking is needed**
### ํŒฉํŠธ์ฒดํฌ๊ฐ€ ํ•„์š”ํ•œ ์ฃผ์žฅ์˜ ์˜ˆ์‹œ (Checkworthy Claim Examples)
โœ… **Label 1 (Checkworthy)**:
- "์ฒญ๋…„ ์‹ค์—…๋ฅ ์ด ์ง€๋‚œ 3๋…„๊ฐ„ ๊ณ„์† ์ƒ์Šนํ–ˆ์Šต๋‹ˆ๋‹ค"
- "์šฐ๋ฆฌ๋‚˜๋ผ GDP ์„ฑ์žฅ๋ฅ ์€ OECD ํ‰๊ท ์„ ๋„˜์–ด์„ฐ์Šต๋‹ˆ๋‹ค"
- "์ด ์ •์ฑ…์œผ๋กœ ์ผ์ž๋ฆฌ๊ฐ€ 100๋งŒ ๊ฐœ ์ฐฝ์ถœ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค"
โŒ **Label 0 (Non-checkworthy)**:
- "์˜ค๋Š˜ ํ† ๋ก ํšŒ๋Š” SBS ์ƒ์•”๋™ ์ŠคํŠœ๋””์˜ค์—์„œ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ๊ณ ์š”"
- "๊ตญ๋ฏผ ์—ฌ๋Ÿฌ๋ถ„๊ป˜ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค"
- "์ œ ์ƒ๊ฐ์—๋Š” ์ด ์ •์ฑ…์ด ์ข‹์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค"
## ๋ฐ์ดํ„ฐ์…‹ (Dataset)
### ๋ฐ์ดํ„ฐ ์ถœ์ฒ˜
- **Source**: CLEF CheckThat! Lab 2024
- **Task**: Task 1 - Check-Worthiness Estimation
- **Original Dataset**: English political debates and speeches
- **Translation**: Machine-translated to Korean for training
### ๋ฐ์ดํ„ฐ์…‹ ํฌ๊ธฐ
- **Training Set**: 22,501 samples
- **Validation Set**: 1,032 samples
- **Test Set**: 318 samples
### ๋ฐ์ดํ„ฐ ํŠน์„ฑ
- ์ •์น˜ ํ† ๋ก , ์—ฐ์„ค๋ฌธ, ๋‰ด์Šค ๊ธฐ์‚ฌ์—์„œ ์ถ”์ถœ๋œ ๋ฌธ์žฅ
- ์ „๋ฌธ ํŒฉํŠธ์ฒด์ปค๋“ค์ด ๋ ˆ์ด๋ธ”๋งํ•œ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ
- ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜•: Label 0 (65%) vs Label 1 (35%)
## ํ•™์Šต ์„ธ๋ถ€์‚ฌํ•ญ (Training Details)
### ํ•™์Šต ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ
- **Epochs**: 5
- **Batch Size (Train)**: 32
- **Batch Size (Eval)**: 64
- **Learning Rate**: 3e-05
- **Weight Decay**: 0.01
- **Warmup Ratio**: 0.1
- **Precision**: BF16
- **Optimizer**: adamw_torch_fused
- **Max Sequence Length**: 128 tokens
- **Seed**: 42
### ํ•™์Šต ํ™˜๊ฒฝ
- **GPU**: NVIDIA GeForce RTX 4090 (24GB)
- **Training Time**: 1.87 minutes
- **Framework**: Hugging Face Transformers
- **Early Stopping**: Patience 3 (based on F1 score)
## ์„ฑ๋Šฅ (Performance)
### Validation Metrics
- **Accuracy**: 97.58%
- **F1 Score**: 94.80%
- **Precision**: 93.83%
- **Recall**: 95.80%
### Test Metrics
- **Accuracy**: 89.31%
- **F1 Score**: 82.65%
- **Precision**: 92.05%
- **Recall**: 75.00%
### Confusion Matrix (Test Set)
```
Predicted
0 1
Actual 0 203 7 (96.7% ์ •ํ™•๋„)
1 27 81 (75.0% ์žฌํ˜„์œจ)
```
**์„ฑ๋Šฅ ํ•ด์„**:
- **๋†’์€ Precision (92.05%)**: ๋ชจ๋ธ์ด "checkworthy"๋ผ๊ณ  ์˜ˆ์ธกํ•œ ๋ฌธ์žฅ์˜ 92%๊ฐ€ ์‹ค์ œ๋กœ ํŒฉํŠธ์ฒดํฌ๊ฐ€ ํ•„์š”
- **์ ์ ˆํ•œ Recall (75.00%)**: ์‹ค์ œ checkworthy ๋ฌธ์žฅ์˜ 75%๋ฅผ ํƒ์ง€
- **๋‚ฎ์€ False Positive (7๊ฐœ)**: ๋ถˆํ•„์š”ํ•œ ํŒฉํŠธ์ฒดํฌ ์š”์ฒญ ์ตœ์†Œํ™”
## ์‚ฌ์šฉ ๋ฐฉ๋ฒ• (How to Use)
### 1. ์„ค์น˜ (Installation)
```bash
pip install transformers torch
```
### 2. ๋ชจ๋ธ ๋กœ๋“œ (Loading the Model)
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# ๋ชจ๋ธ ๋กœ๋“œ
model_name = "jonghhhh/claim_factcheck"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# GPU ์‚ฌ์šฉ (์„ ํƒ์‚ฌํ•ญ)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
print(f"โœ… ๋ชจ๋ธ ๋กœ๋“œ ์™„๋ฃŒ! (Device: {device})")
```
### 3. ์ถ”๋ก  ์˜ˆ์‹œ (Inference Example)
#### ๋‹จ์ผ ๋ฌธ์žฅ ๋ถ„๋ฅ˜
```python
def predict_claim(text):
"""
์ž…๋ ฅ ๋ฌธ์žฅ์ด ํŒฉํŠธ์ฒดํฌ๊ฐ€ ํ•„์š”ํ•œ ์ฃผ์žฅ์ธ์ง€ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.
Args:
text (str): ๋ถ„์„ํ•  ํ•œ๊ตญ์–ด ๋ฌธ์žฅ
Returns:
dict: {
'text': ์ž…๋ ฅ ๋ฌธ์žฅ,
'is_checkworthy': True/False,
'confidence': 0.0~1.0 (ํ™•์‹ ๋„),
'label': 0 ๋˜๋Š” 1,
'probabilities': {'non_checkworthy': 0.xx, 'checkworthy': 0.xx}
}
"""
# ํ† ํฌ๋‚˜์ด์ง•
inputs = tokenizer(
text,
truncation=True,
max_length=128,
return_tensors="pt"
)
inputs = {k: v.to(device) for k, v in inputs.items()}
# ์ถ”๋ก 
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predicted_label = torch.argmax(probs, dim=-1).item()
confidence = probs[0][predicted_label].item()
return {
'text': text,
'is_checkworthy': bool(predicted_label),
'confidence': confidence,
'label': predicted_label,
'probabilities': {
'non_checkworthy': probs[0][0].item(),
'checkworthy': probs[0][1].item()
}
}
# ์‚ฌ์šฉ ์˜ˆ์‹œ
examples = [
"์˜ค๋Š˜ ํ† ๋ก ํšŒ๋Š” SBS ์ƒ์•”๋™ ์ŠคํŠœ๋””์˜ค์—์„œ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ๊ณ ์š”.",
"์ฒญ๋…„ ์‹ค์—…๋ฅ ์ด ์ตœ๊ทผ 3๋…„๊ฐ„ ๊ณ„์† ์ƒ์Šนํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.",
"์šฐ๋ฆฌ๋‚˜๋ผ GDP ์„ฑ์žฅ๋ฅ ์€ OECD ํ‰๊ท ์„ ๋„˜์–ด์„ฐ์Šต๋‹ˆ๋‹ค.",
"๊ตญ๋ฏผ ์—ฌ๋Ÿฌ๋ถ„๊ป˜ ์ง„์‹ฌ์œผ๋กœ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค."
]
for text in examples:
result = predict_claim(text)
print(f"\n๐Ÿ“ ์ž…๋ ฅ: {result['text']}")
print(f"{'๐Ÿ” ํŒฉํŠธ์ฒดํฌ ํ•„์š”' if result['is_checkworthy'] else 'โœ… ํŒฉํŠธ์ฒดํฌ ๋ถˆํ•„์š”'}")
print(f"ํ™•์‹ ๋„: {result['confidence']*100:.1f}%")
print(f"์ƒ์„ธ ํ™•๋ฅ : Non-CW {result['probabilities']['non_checkworthy']*100:.1f}% | CW {result['probabilities']['checkworthy']*100:.1f}%")
```
**์ถœ๋ ฅ ์˜ˆ์‹œ**:
```
๐Ÿ“ ์ž…๋ ฅ: ์ฒญ๋…„ ์‹ค์—…๋ฅ ์ด ์ตœ๊ทผ 3๋…„๊ฐ„ ๊ณ„์† ์ƒ์Šนํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ” ํŒฉํŠธ์ฒดํฌ ํ•„์š”
ํ™•์‹ ๋„: 94.3%
์ƒ์„ธ ํ™•๋ฅ : Non-CW 5.7% | CW 94.3%
๐Ÿ“ ์ž…๋ ฅ: ์˜ค๋Š˜ ํ† ๋ก ํšŒ๋Š” SBS ์ƒ์•”๋™ ์ŠคํŠœ๋””์˜ค์—์„œ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ๊ณ ์š”.
โœ… ํŒฉํŠธ์ฒดํฌ ๋ถˆํ•„์š”
ํ™•์‹ ๋„: 98.2%
์ƒ์„ธ ํ™•๋ฅ : Non-CW 98.2% | CW 1.8%
```
#### ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ (Batch Processing)
```python
def predict_claims_batch(texts, batch_size=32):
"""
์—ฌ๋Ÿฌ ๋ฌธ์žฅ์„ ๋ฐฐ์น˜๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
Args:
texts (list): ๋ฌธ์žฅ ๋ฆฌ์ŠคํŠธ
batch_size (int): ๋ฐฐ์น˜ ํฌ๊ธฐ
Returns:
list: ๊ฐ ๋ฌธ์žฅ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ ๋ฆฌ์ŠคํŠธ
"""
results = []
for i in range(0, len(texts), batch_size):
batch_texts = texts[i:i+batch_size]
# ๋ฐฐ์น˜ ํ† ํฌ๋‚˜์ด์ง•
inputs = tokenizer(
batch_texts,
truncation=True,
max_length=128,
padding=True,
return_tensors="pt"
)
inputs = {k: v.to(device) for k, v in inputs.items()}
# ๋ฐฐ์น˜ ์ถ”๋ก 
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predicted_labels = torch.argmax(probs, dim=-1).cpu().numpy()
# ๊ฒฐ๊ณผ ์ €์žฅ
for j, text in enumerate(batch_texts):
results.append({
'text': text,
'is_checkworthy': bool(predicted_labels[j]),
'confidence': probs[j][predicted_labels[j]].item(),
'label': int(predicted_labels[j])
})
return results
# ๋ฐฐ์น˜ ์ถ”๋ก  ์˜ˆ์‹œ
texts = [
"๊ตญํšŒ์˜์› ์ •์›์„ 300๋ช…์œผ๋กœ ํ™•๋Œ€ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.",
"๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.",
"2024๋…„ ๊ฒฝ์ œ์„ฑ์žฅ๋ฅ ์ด 2.1%๋ฅผ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค.",
# ... ๋” ๋งŽ์€ ๋ฌธ์žฅ๋“ค
]
batch_results = predict_claims_batch(texts)
checkworthy_claims = [r for r in batch_results if r['is_checkworthy']]
print(f"โœ… ์ด {len(texts)}๊ฐœ ๋ฌธ์žฅ ์ค‘ {len(checkworthy_claims)}๊ฐœ๊ฐ€ ํŒฉํŠธ์ฒดํฌ ํ•„์š”")
```
### 4. ์‹ค์ „ ํ™œ์šฉ ์˜ˆ์‹œ (Real-world Use Case)
```python
# ๋‰ด์Šค ๊ธฐ์‚ฌ์—์„œ ํŒฉํŠธ์ฒดํฌ ๋Œ€์ƒ ์ถ”์ถœ
def extract_checkworthy_claims(article_text, threshold=0.7):
"""
๊ธฐ์‚ฌ์—์„œ ํŒฉํŠธ์ฒดํฌ๊ฐ€ ํ•„์š”ํ•œ ๋ฌธ์žฅ๋“ค์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
Args:
article_text (str): ๋‰ด์Šค ๊ธฐ์‚ฌ ์ „๋ฌธ
threshold (float): checkworthy ํŒ๋‹จ ์ž„๊ณ„๊ฐ’ (0.0~1.0)
Returns:
list: ํŒฉํŠธ์ฒดํฌ ๋Œ€์ƒ ๋ฌธ์žฅ๋“ค
"""
# ๋ฌธ์žฅ ๋ถ„๋ฆฌ (๊ฐ„๋‹จํ•œ ์˜ˆ์‹œ)
sentences = [s.strip() for s in article_text.split('.') if s.strip()]
# ๋ฐฐ์น˜ ์˜ˆ์ธก
results = predict_claims_batch(sentences)
# ์ž„๊ณ„๊ฐ’ ์ด์ƒ์˜ checkworthy ๋ฌธ์žฅ๋งŒ ํ•„ํ„ฐ๋ง
checkworthy_claims = [
r for r in results
if r['is_checkworthy'] and r['confidence'] >= threshold
]
# ํ™•์‹ ๋„ ์ˆœ์œผ๋กœ ์ •๋ ฌ
checkworthy_claims.sort(key=lambda x: x['confidence'], reverse=True)
return checkworthy_claims
# ์‚ฌ์šฉ ์˜ˆ์‹œ
article = """
์ •๋ถ€๋Š” ์˜ค๋Š˜ ๊ฒฝ์ œ์ •์ฑ… ๋ฐฉํ–ฅ์„ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค.
์ฒญ๋…„ ์‹ค์—…๋ฅ ์ด ์ง€๋‚œํ•ด ๋Œ€๋น„ 2.3%p ๊ฐ์†Œํ–ˆ๋‹ค๊ณ  ๋ฐํ˜”์Šต๋‹ˆ๋‹ค.
์ด๋Š” ์—ญ๋Œ€ ์ตœ๋Œ€ ํญ์˜ ํ•˜๋ฝ์ž…๋‹ˆ๋‹ค.
์•ž์œผ๋กœ๋„ ์ผ์ž๋ฆฌ ์ฐฝ์ถœ์— ํž˜์“ฐ๊ฒ ๋‹ค๊ณ  ๊ฐ•์กฐํ–ˆ์Šต๋‹ˆ๋‹ค.
"""
claims = extract_checkworthy_claims(article, threshold=0.8)
print(f"๐Ÿ” ๋ฐœ๊ฒฌ๋œ ํŒฉํŠธ์ฒดํฌ ๋Œ€์ƒ: {len(claims)}๊ฐœ\n")
for i, claim in enumerate(claims, 1):
print(f"{i}. {claim['text']}")
print(f" ํ™•์‹ ๋„: {claim['confidence']*100:.1f}%\n")
```
## ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ (Model Architecture)
- **Model Type**: ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately)
- **Hidden Size**: 768
- **Number of Layers**: 12
- **Number of Attention Heads**: 12
- **Vocabulary Size**: 32,000
- **Max Sequence Length**: 128 tokens
- **Classification Head**: Linear layer (768 โ†’ 2)
## ํ•œ๊ณ„ ๋ฐ ๊ณ ๋ ค์‚ฌํ•ญ (Limitations)
1. **๋„๋ฉ”์ธ ํŠนํ™”**: ์ •์น˜/๋‰ด์Šค ๋„๋ฉ”์ธ์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์–ด, ์ผ์ƒ ๋Œ€ํ™”๋‚˜ ๊ธฐ์ˆ  ๋ฌธ์„œ์—๋Š” ์„ฑ๋Šฅ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Œ
2. **๊ธธ์ด ์ œํ•œ**: ์ตœ๋Œ€ 128 ํ† ํฐ๊นŒ์ง€๋งŒ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ (์•ฝ 100-150 ๋‹จ์–ด)
3. **๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋ฐ์ดํ„ฐ**: ์˜์–ด์—์„œ ๋ฒˆ์—ญ๋œ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋˜์–ด ์ž์—ฐ์Šค๋Ÿฌ์šด ํ•œ๊ตญ์–ด ํ‘œํ˜„์—์„œ ์„ฑ๋Šฅ ์ฐจ์ด ๊ฐ€๋Šฅ
4. **์ด์ง„ ๋ถ„๋ฅ˜**: Checkworthy ์ •๋„๋ฅผ 0/1๋กœ๋งŒ ๋ถ„๋ฅ˜ (์„ธ๋ฐ€ํ•œ ์ ์ˆ˜ ์ œ๊ณต ์•ˆ ํ•จ)
5. **False Negative**: ์‹ค์ œ ์ฃผ์žฅ์˜ 25%๋ฅผ ๋†“์น  ์ˆ˜ ์žˆ์Œ (Recall 75%)
## ๊ฐœ์„  ๋ฐฉํ–ฅ (Future Improvements)
- [ ] ํ•œ๊ตญ์–ด ๋„ค์ดํ‹ฐ๋ธŒ ํŒฉํŠธ์ฒดํฌ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ถ”๊ฐ€ ํ•™์Šต
- [ ] ๋” ๊ธด ๋ฌธ๋งฅ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ชจ๋ธ ์—…๊ทธ๋ ˆ์ด๋“œ (max_length 256+)
- [ ] ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ (checkworthy ์ ์ˆ˜๋ฅผ 0-5 ์ฒ™๋„๋กœ)
- [ ] ์ฃผ์žฅ์˜ ์ฃผ์ œ ์นดํ…Œ๊ณ ๋ฆฌ ๋ถ„๋ฅ˜ ๊ธฐ๋Šฅ ์ถ”๊ฐ€
## ๋ผ์ด์„ ์Šค (License)
์ด ๋ชจ๋ธ์€ ๋ฒ ์ด์Šค ๋ชจ๋ธ์ธ [beomi/KcELECTRA-base-v2022](https://huggingface.co/beomi/KcELECTRA-base-v2022)์˜ ๋ผ์ด์„ ์Šค๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.
## ์ธ์šฉ (Citation)
์ด ๋ชจ๋ธ์„ ์—ฐ๊ตฌ๋‚˜ ํ”„๋กœ์ ํŠธ์— ์‚ฌ์šฉํ•˜์‹ ๋‹ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ธ์šฉํ•ด์ฃผ์„ธ์š”:
```bibtex
@misc{korean-claim-factcheck-2025,
author = {Jonghhhh},
title = {Korean Claim Detection Model for Fact-Checking},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/jonghhhh/claim_factcheck}},
note = {Based on KcELECTRA-base-v2022}
}
```
## ์ฐธ๊ณ  ์ž๋ฃŒ (References)
- **Base Model**: [beomi/KcELECTRA-base-v2022](https://huggingface.co/beomi/KcELECTRA-base-v2022)
- **Dataset**: [CLEF CheckThat! Lab 2024](https://clef2025.clef-initiative.eu/index.php?page=Pages/Labs/CheckThat.html)
- **Paper**: [CheckThat! Lab: Check-Worthiness, Subjectivity, and Persuasion](https://link.springer.com/chapter/10.1007/978-3-031-13643-6_24)
## ์—ฐ๋ฝ์ฒ˜ (Contact)
์งˆ๋ฌธ์ด๋‚˜ ํ”ผ๋“œ๋ฐฑ์ด ์žˆ์œผ์‹œ๋ฉด Issues๋ฅผ ํ†ตํ•ด ๋‚จ๊ฒจ์ฃผ์„ธ์š”!
---
**Tags**: `claim-detection`, `fact-checking`, `korean`, `electra`, `text-classification`, `checkworthy`, `misinformation-detection`