| # Korean Claim Detection Model for Fact-Checking | |
| ## ๋ชจ๋ธ ์๊ฐ (Model Description) | |
| ์ด ๋ชจ๋ธ์ **ํ๊ตญ์ด ๋ฌธ์ฅ์์ ํฉํธ์ฒดํฌ๊ฐ ํ์ํ ์ฃผ์ฅ(claim)์ ์๋์ผ๋ก ํ์ง**ํ๋ ์ด์ง ๋ถ๋ฅ ๋ชจ๋ธ์ ๋๋ค. | |
| ๋ด์ค ๊ธฐ์ฌ, ์ ์น ํ ๋ก , ์์ ๋ฏธ๋์ด ๊ฒ์๋ฌผ ๋ฑ์์ ๊ฒ์ฆ์ด ํ์ํ ์ฃผ์ฅ์ ์๋ณํ์ฌ, ํฉํธ์ฒดํน ์ํฌํ๋ก์ฐ์ ์ฒซ ๋จ๊ณ๋ฅผ ์๋ํํ ์ ์์ต๋๋ค. | |
| This model **automatically detects claims that require fact-checking** in Korean sentences. It can identify verifiable claims in news articles, political debates, and social media posts, automating the first step of the fact-checking workflow. | |
| - **Base Model**: [beomi/KcELECTRA-base-v2022](https://huggingface.co/beomi/KcELECTRA-base-v2022) | |
| - **Task**: Claim Detection (Checkworthy Sentence Classification) | |
| - **Language**: Korean (ํ๊ตญ์ด) | |
| - **Labels**: | |
| - `0`: ํฉํธ์ฒดํฌ๊ฐ ๋ถํ์ํ ๋ฌธ์ฅ (Non-checkworthy) | |
| - `1`: ํฉํธ์ฒดํฌ๊ฐ ํ์ํ ์ฃผ์ฅ (Checkworthy claim) | |
| ## ๋ชจ๋ธ ๋ชฉํ (Model Objective) | |
| ์ ๋ ฅ๋ ํ๊ตญ์ด ๋ฌธ์ฅ์ ๋ถ์ํ์ฌ ๋ค์์ ํ๋จํฉ๋๋ค: | |
| - **๊ฒ์ฆ ๊ฐ๋ฅํ ์ฌ์ค์ ์ฃผ์ฅ**์ธ์ง | |
| - **ํฉํธ์ฒดํน์ด ํ์ํ ์ ๋**๋ ์ผ๋ง๋ ๋๋์ง | |
| This model analyzes Korean sentences to determine: | |
| - Whether they contain **verifiable factual claims** | |
| - The **degree to which fact-checking is needed** | |
| ### ํฉํธ์ฒดํฌ๊ฐ ํ์ํ ์ฃผ์ฅ์ ์์ (Checkworthy Claim Examples) | |
| โ **Label 1 (Checkworthy)**: | |
| - "์ฒญ๋ ์ค์ ๋ฅ ์ด ์ง๋ 3๋ ๊ฐ ๊ณ์ ์์นํ์ต๋๋ค" | |
| - "์ฐ๋ฆฌ๋๋ผ GDP ์ฑ์ฅ๋ฅ ์ OECD ํ๊ท ์ ๋์ด์ฐ์ต๋๋ค" | |
| - "์ด ์ ์ฑ ์ผ๋ก ์ผ์๋ฆฌ๊ฐ 100๋ง ๊ฐ ์ฐฝ์ถ๋ ๊ฒ์ ๋๋ค" | |
| โ **Label 0 (Non-checkworthy)**: | |
| - "์ค๋ ํ ๋ก ํ๋ SBS ์์๋ ์คํ๋์ค์์ ์งํํ๊ณ ์๊ณ ์" | |
| - "๊ตญ๋ฏผ ์ฌ๋ฌ๋ถ๊ป ๊ฐ์ฌ๋๋ฆฝ๋๋ค" | |
| - "์ ์๊ฐ์๋ ์ด ์ ์ฑ ์ด ์ข์ ๊ฒ ๊ฐ์ต๋๋ค" | |
| ## ๋ฐ์ดํฐ์ (Dataset) | |
| ### ๋ฐ์ดํฐ ์ถ์ฒ | |
| - **Source**: CLEF CheckThat! Lab 2024 | |
| - **Task**: Task 1 - Check-Worthiness Estimation | |
| - **Original Dataset**: English political debates and speeches | |
| - **Translation**: Machine-translated to Korean for training | |
| ### ๋ฐ์ดํฐ์ ํฌ๊ธฐ | |
| - **Training Set**: 22,501 samples | |
| - **Validation Set**: 1,032 samples | |
| - **Test Set**: 318 samples | |
| ### ๋ฐ์ดํฐ ํน์ฑ | |
| - ์ ์น ํ ๋ก , ์ฐ์ค๋ฌธ, ๋ด์ค ๊ธฐ์ฌ์์ ์ถ์ถ๋ ๋ฌธ์ฅ | |
| - ์ ๋ฌธ ํฉํธ์ฒด์ปค๋ค์ด ๋ ์ด๋ธ๋งํ ๊ณ ํ์ง ๋ฐ์ดํฐ | |
| - ํด๋์ค ๋ถ๊ท ํ: Label 0 (65%) vs Label 1 (35%) | |
| ## ํ์ต ์ธ๋ถ์ฌํญ (Training Details) | |
| ### ํ์ต ํ์ดํผํ๋ผ๋ฏธํฐ | |
| - **Epochs**: 5 | |
| - **Batch Size (Train)**: 32 | |
| - **Batch Size (Eval)**: 64 | |
| - **Learning Rate**: 3e-05 | |
| - **Weight Decay**: 0.01 | |
| - **Warmup Ratio**: 0.1 | |
| - **Precision**: BF16 | |
| - **Optimizer**: adamw_torch_fused | |
| - **Max Sequence Length**: 128 tokens | |
| - **Seed**: 42 | |
| ### ํ์ต ํ๊ฒฝ | |
| - **GPU**: NVIDIA GeForce RTX 4090 (24GB) | |
| - **Training Time**: 1.87 minutes | |
| - **Framework**: Hugging Face Transformers | |
| - **Early Stopping**: Patience 3 (based on F1 score) | |
| ## ์ฑ๋ฅ (Performance) | |
| ### Validation Metrics | |
| - **Accuracy**: 97.58% | |
| - **F1 Score**: 94.80% | |
| - **Precision**: 93.83% | |
| - **Recall**: 95.80% | |
| ### Test Metrics | |
| - **Accuracy**: 89.31% | |
| - **F1 Score**: 82.65% | |
| - **Precision**: 92.05% | |
| - **Recall**: 75.00% | |
| ### Confusion Matrix (Test Set) | |
| ``` | |
| Predicted | |
| 0 1 | |
| Actual 0 203 7 (96.7% ์ ํ๋) | |
| 1 27 81 (75.0% ์ฌํ์จ) | |
| ``` | |
| **์ฑ๋ฅ ํด์**: | |
| - **๋์ Precision (92.05%)**: ๋ชจ๋ธ์ด "checkworthy"๋ผ๊ณ ์์ธกํ ๋ฌธ์ฅ์ 92%๊ฐ ์ค์ ๋ก ํฉํธ์ฒดํฌ๊ฐ ํ์ | |
| - **์ ์ ํ Recall (75.00%)**: ์ค์ checkworthy ๋ฌธ์ฅ์ 75%๋ฅผ ํ์ง | |
| - **๋ฎ์ False Positive (7๊ฐ)**: ๋ถํ์ํ ํฉํธ์ฒดํฌ ์์ฒญ ์ต์ํ | |
| ## ์ฌ์ฉ ๋ฐฉ๋ฒ (How to Use) | |
| ### 1. ์ค์น (Installation) | |
| ```bash | |
| pip install transformers torch | |
| ``` | |
| ### 2. ๋ชจ๋ธ ๋ก๋ (Loading the Model) | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| # ๋ชจ๋ธ ๋ก๋ | |
| model_name = "jonghhhh/claim_factcheck" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForSequenceClassification.from_pretrained(model_name) | |
| # GPU ์ฌ์ฉ (์ ํ์ฌํญ) | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| model.to(device) | |
| model.eval() | |
| print(f"โ ๋ชจ๋ธ ๋ก๋ ์๋ฃ! (Device: {device})") | |
| ``` | |
| ### 3. ์ถ๋ก ์์ (Inference Example) | |
| #### ๋จ์ผ ๋ฌธ์ฅ ๋ถ๋ฅ | |
| ```python | |
| def predict_claim(text): | |
| """ | |
| ์ ๋ ฅ ๋ฌธ์ฅ์ด ํฉํธ์ฒดํฌ๊ฐ ํ์ํ ์ฃผ์ฅ์ธ์ง ํ๋จํฉ๋๋ค. | |
| Args: | |
| text (str): ๋ถ์ํ ํ๊ตญ์ด ๋ฌธ์ฅ | |
| Returns: | |
| dict: { | |
| 'text': ์ ๋ ฅ ๋ฌธ์ฅ, | |
| 'is_checkworthy': True/False, | |
| 'confidence': 0.0~1.0 (ํ์ ๋), | |
| 'label': 0 ๋๋ 1, | |
| 'probabilities': {'non_checkworthy': 0.xx, 'checkworthy': 0.xx} | |
| } | |
| """ | |
| # ํ ํฌ๋์ด์ง | |
| inputs = tokenizer( | |
| text, | |
| truncation=True, | |
| max_length=128, | |
| return_tensors="pt" | |
| ) | |
| inputs = {k: v.to(device) for k, v in inputs.items()} | |
| # ์ถ๋ก | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| probs = torch.softmax(outputs.logits, dim=-1) | |
| predicted_label = torch.argmax(probs, dim=-1).item() | |
| confidence = probs[0][predicted_label].item() | |
| return { | |
| 'text': text, | |
| 'is_checkworthy': bool(predicted_label), | |
| 'confidence': confidence, | |
| 'label': predicted_label, | |
| 'probabilities': { | |
| 'non_checkworthy': probs[0][0].item(), | |
| 'checkworthy': probs[0][1].item() | |
| } | |
| } | |
| # ์ฌ์ฉ ์์ | |
| examples = [ | |
| "์ค๋ ํ ๋ก ํ๋ SBS ์์๋ ์คํ๋์ค์์ ์งํํ๊ณ ์๊ณ ์.", | |
| "์ฒญ๋ ์ค์ ๋ฅ ์ด ์ต๊ทผ 3๋ ๊ฐ ๊ณ์ ์์นํ๊ณ ์์ต๋๋ค.", | |
| "์ฐ๋ฆฌ๋๋ผ GDP ์ฑ์ฅ๋ฅ ์ OECD ํ๊ท ์ ๋์ด์ฐ์ต๋๋ค.", | |
| "๊ตญ๋ฏผ ์ฌ๋ฌ๋ถ๊ป ์ง์ฌ์ผ๋ก ๊ฐ์ฌ๋๋ฆฝ๋๋ค." | |
| ] | |
| for text in examples: | |
| result = predict_claim(text) | |
| print(f"\n๐ ์ ๋ ฅ: {result['text']}") | |
| print(f"{'๐ ํฉํธ์ฒดํฌ ํ์' if result['is_checkworthy'] else 'โ ํฉํธ์ฒดํฌ ๋ถํ์'}") | |
| print(f"ํ์ ๋: {result['confidence']*100:.1f}%") | |
| print(f"์์ธ ํ๋ฅ : Non-CW {result['probabilities']['non_checkworthy']*100:.1f}% | CW {result['probabilities']['checkworthy']*100:.1f}%") | |
| ``` | |
| **์ถ๋ ฅ ์์**: | |
| ``` | |
| ๐ ์ ๋ ฅ: ์ฒญ๋ ์ค์ ๋ฅ ์ด ์ต๊ทผ 3๋ ๊ฐ ๊ณ์ ์์นํ๊ณ ์์ต๋๋ค. | |
| ๐ ํฉํธ์ฒดํฌ ํ์ | |
| ํ์ ๋: 94.3% | |
| ์์ธ ํ๋ฅ : Non-CW 5.7% | CW 94.3% | |
| ๐ ์ ๋ ฅ: ์ค๋ ํ ๋ก ํ๋ SBS ์์๋ ์คํ๋์ค์์ ์งํํ๊ณ ์๊ณ ์. | |
| โ ํฉํธ์ฒดํฌ ๋ถํ์ | |
| ํ์ ๋: 98.2% | |
| ์์ธ ํ๋ฅ : Non-CW 98.2% | CW 1.8% | |
| ``` | |
| #### ๋ฐฐ์น ์ฒ๋ฆฌ (Batch Processing) | |
| ```python | |
| def predict_claims_batch(texts, batch_size=32): | |
| """ | |
| ์ฌ๋ฌ ๋ฌธ์ฅ์ ๋ฐฐ์น๋ก ์ฒ๋ฆฌํฉ๋๋ค. | |
| Args: | |
| texts (list): ๋ฌธ์ฅ ๋ฆฌ์คํธ | |
| batch_size (int): ๋ฐฐ์น ํฌ๊ธฐ | |
| Returns: | |
| list: ๊ฐ ๋ฌธ์ฅ์ ์์ธก ๊ฒฐ๊ณผ ๋ฆฌ์คํธ | |
| """ | |
| results = [] | |
| for i in range(0, len(texts), batch_size): | |
| batch_texts = texts[i:i+batch_size] | |
| # ๋ฐฐ์น ํ ํฌ๋์ด์ง | |
| inputs = tokenizer( | |
| batch_texts, | |
| truncation=True, | |
| max_length=128, | |
| padding=True, | |
| return_tensors="pt" | |
| ) | |
| inputs = {k: v.to(device) for k, v in inputs.items()} | |
| # ๋ฐฐ์น ์ถ๋ก | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| probs = torch.softmax(outputs.logits, dim=-1) | |
| predicted_labels = torch.argmax(probs, dim=-1).cpu().numpy() | |
| # ๊ฒฐ๊ณผ ์ ์ฅ | |
| for j, text in enumerate(batch_texts): | |
| results.append({ | |
| 'text': text, | |
| 'is_checkworthy': bool(predicted_labels[j]), | |
| 'confidence': probs[j][predicted_labels[j]].item(), | |
| 'label': int(predicted_labels[j]) | |
| }) | |
| return results | |
| # ๋ฐฐ์น ์ถ๋ก ์์ | |
| texts = [ | |
| "๊ตญํ์์ ์ ์์ 300๋ช ์ผ๋ก ํ๋ํ๊ฒ ์ต๋๋ค.", | |
| "๊ฐ์ฌํฉ๋๋ค.", | |
| "2024๋ ๊ฒฝ์ ์ฑ์ฅ๋ฅ ์ด 2.1%๋ฅผ ๊ธฐ๋กํ์ต๋๋ค.", | |
| # ... ๋ ๋ง์ ๋ฌธ์ฅ๋ค | |
| ] | |
| batch_results = predict_claims_batch(texts) | |
| checkworthy_claims = [r for r in batch_results if r['is_checkworthy']] | |
| print(f"โ ์ด {len(texts)}๊ฐ ๋ฌธ์ฅ ์ค {len(checkworthy_claims)}๊ฐ๊ฐ ํฉํธ์ฒดํฌ ํ์") | |
| ``` | |
| ### 4. ์ค์ ํ์ฉ ์์ (Real-world Use Case) | |
| ```python | |
| # ๋ด์ค ๊ธฐ์ฌ์์ ํฉํธ์ฒดํฌ ๋์ ์ถ์ถ | |
| def extract_checkworthy_claims(article_text, threshold=0.7): | |
| """ | |
| ๊ธฐ์ฌ์์ ํฉํธ์ฒดํฌ๊ฐ ํ์ํ ๋ฌธ์ฅ๋ค์ ์ถ์ถํฉ๋๋ค. | |
| Args: | |
| article_text (str): ๋ด์ค ๊ธฐ์ฌ ์ ๋ฌธ | |
| threshold (float): checkworthy ํ๋จ ์๊ณ๊ฐ (0.0~1.0) | |
| Returns: | |
| list: ํฉํธ์ฒดํฌ ๋์ ๋ฌธ์ฅ๋ค | |
| """ | |
| # ๋ฌธ์ฅ ๋ถ๋ฆฌ (๊ฐ๋จํ ์์) | |
| sentences = [s.strip() for s in article_text.split('.') if s.strip()] | |
| # ๋ฐฐ์น ์์ธก | |
| results = predict_claims_batch(sentences) | |
| # ์๊ณ๊ฐ ์ด์์ checkworthy ๋ฌธ์ฅ๋ง ํํฐ๋ง | |
| checkworthy_claims = [ | |
| r for r in results | |
| if r['is_checkworthy'] and r['confidence'] >= threshold | |
| ] | |
| # ํ์ ๋ ์์ผ๋ก ์ ๋ ฌ | |
| checkworthy_claims.sort(key=lambda x: x['confidence'], reverse=True) | |
| return checkworthy_claims | |
| # ์ฌ์ฉ ์์ | |
| article = """ | |
| ์ ๋ถ๋ ์ค๋ ๊ฒฝ์ ์ ์ฑ ๋ฐฉํฅ์ ๋ฐํํ์ต๋๋ค. | |
| ์ฒญ๋ ์ค์ ๋ฅ ์ด ์ง๋ํด ๋๋น 2.3%p ๊ฐ์ํ๋ค๊ณ ๋ฐํ์ต๋๋ค. | |
| ์ด๋ ์ญ๋ ์ต๋ ํญ์ ํ๋ฝ์ ๋๋ค. | |
| ์์ผ๋ก๋ ์ผ์๋ฆฌ ์ฐฝ์ถ์ ํ์ฐ๊ฒ ๋ค๊ณ ๊ฐ์กฐํ์ต๋๋ค. | |
| """ | |
| claims = extract_checkworthy_claims(article, threshold=0.8) | |
| print(f"๐ ๋ฐ๊ฒฌ๋ ํฉํธ์ฒดํฌ ๋์: {len(claims)}๊ฐ\n") | |
| for i, claim in enumerate(claims, 1): | |
| print(f"{i}. {claim['text']}") | |
| print(f" ํ์ ๋: {claim['confidence']*100:.1f}%\n") | |
| ``` | |
| ## ๋ชจ๋ธ ์ํคํ ์ฒ (Model Architecture) | |
| - **Model Type**: ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) | |
| - **Hidden Size**: 768 | |
| - **Number of Layers**: 12 | |
| - **Number of Attention Heads**: 12 | |
| - **Vocabulary Size**: 32,000 | |
| - **Max Sequence Length**: 128 tokens | |
| - **Classification Head**: Linear layer (768 โ 2) | |
| ## ํ๊ณ ๋ฐ ๊ณ ๋ ค์ฌํญ (Limitations) | |
| 1. **๋๋ฉ์ธ ํนํ**: ์ ์น/๋ด์ค ๋๋ฉ์ธ์ ์ต์ ํ๋์ด ์์ด, ์ผ์ ๋ํ๋ ๊ธฐ์ ๋ฌธ์์๋ ์ฑ๋ฅ์ด ๋จ์ด์ง ์ ์์ | |
| 2. **๊ธธ์ด ์ ํ**: ์ต๋ 128 ํ ํฐ๊น์ง๋ง ์ฒ๋ฆฌ ๊ฐ๋ฅ (์ฝ 100-150 ๋จ์ด) | |
| 3. **๊ธฐ๊ณ ๋ฒ์ญ ๋ฐ์ดํฐ**: ์์ด์์ ๋ฒ์ญ๋ ๋ฐ์ดํฐ๋ก ํ์ต๋์ด ์์ฐ์ค๋ฌ์ด ํ๊ตญ์ด ํํ์์ ์ฑ๋ฅ ์ฐจ์ด ๊ฐ๋ฅ | |
| 4. **์ด์ง ๋ถ๋ฅ**: Checkworthy ์ ๋๋ฅผ 0/1๋ก๋ง ๋ถ๋ฅ (์ธ๋ฐํ ์ ์ ์ ๊ณต ์ ํจ) | |
| 5. **False Negative**: ์ค์ ์ฃผ์ฅ์ 25%๋ฅผ ๋์น ์ ์์ (Recall 75%) | |
| ## ๊ฐ์ ๋ฐฉํฅ (Future Improvements) | |
| - [ ] ํ๊ตญ์ด ๋ค์ดํฐ๋ธ ํฉํธ์ฒดํฌ ๋ฐ์ดํฐ์ ์ผ๋ก ์ถ๊ฐ ํ์ต | |
| - [ ] ๋ ๊ธด ๋ฌธ๋งฅ ์ฒ๋ฆฌ๋ฅผ ์ํ ๋ชจ๋ธ ์ ๊ทธ๋ ์ด๋ (max_length 256+) | |
| - [ ] ๋ค์ค ํด๋์ค ๋ถ๋ฅ (checkworthy ์ ์๋ฅผ 0-5 ์ฒ๋๋ก) | |
| - [ ] ์ฃผ์ฅ์ ์ฃผ์ ์นดํ ๊ณ ๋ฆฌ ๋ถ๋ฅ ๊ธฐ๋ฅ ์ถ๊ฐ | |
| ## ๋ผ์ด์ ์ค (License) | |
| ์ด ๋ชจ๋ธ์ ๋ฒ ์ด์ค ๋ชจ๋ธ์ธ [beomi/KcELECTRA-base-v2022](https://huggingface.co/beomi/KcELECTRA-base-v2022)์ ๋ผ์ด์ ์ค๋ฅผ ๋ฐ๋ฆ ๋๋ค. | |
| ## ์ธ์ฉ (Citation) | |
| ์ด ๋ชจ๋ธ์ ์ฐ๊ตฌ๋ ํ๋ก์ ํธ์ ์ฌ์ฉํ์ ๋ค๋ฉด ๋ค์๊ณผ ๊ฐ์ด ์ธ์ฉํด์ฃผ์ธ์: | |
| ```bibtex | |
| @misc{korean-claim-factcheck-2025, | |
| author = {Jonghhhh}, | |
| title = {Korean Claim Detection Model for Fact-Checking}, | |
| year = {2025}, | |
| publisher = {Hugging Face}, | |
| howpublished = {\url{https://huggingface.co/jonghhhh/claim_factcheck}}, | |
| note = {Based on KcELECTRA-base-v2022} | |
| } | |
| ``` | |
| ## ์ฐธ๊ณ ์๋ฃ (References) | |
| - **Base Model**: [beomi/KcELECTRA-base-v2022](https://huggingface.co/beomi/KcELECTRA-base-v2022) | |
| - **Dataset**: [CLEF CheckThat! Lab 2024](https://clef2025.clef-initiative.eu/index.php?page=Pages/Labs/CheckThat.html) | |
| - **Paper**: [CheckThat! Lab: Check-Worthiness, Subjectivity, and Persuasion](https://link.springer.com/chapter/10.1007/978-3-031-13643-6_24) | |
| ## ์ฐ๋ฝ์ฒ (Contact) | |
| ์ง๋ฌธ์ด๋ ํผ๋๋ฐฑ์ด ์์ผ์๋ฉด Issues๋ฅผ ํตํด ๋จ๊ฒจ์ฃผ์ธ์! | |
| --- | |
| **Tags**: `claim-detection`, `fact-checking`, `korean`, `electra`, `text-classification`, `checkworthy`, `misinformation-detection` | |