DeBERTa-v3-Small – Factuality / Misinformation Classifier
Lightweight DeBERTa-v3-Small fine-tuned to detect factual vs. non-factual statements using TruthfulQA and FEVER.
Part of the Army of Safeguards research project
Model Details
| Property | Value |
|---|---|
| Base model | microsoft/deberta-v3-small |
| Architecture | Encoder-only Transformer (≈ 86 M params) |
| Task | Binary text classification (0 = factual, 1 = non-factual) |
| Language | English |
| Fine-tuning framework | Hugging Face Transformers v4.44 |
| Trained by | Ajith Bondili |
| Hardware | NVIDIA T4 (Google Colab) |
| Epochs | 3 |
| Batch size | 16 |
| Learning rate | 2e-5 |
| Max sequence len | 256 tokens |
Training Data
Merged and balanced from two open-source datasets:
- TruthfulQA (generation) – Q/A pairs labeled truthful vs false.
- FEVER v1.0 – Real-world claims labeled Supported, Refuted, or Not Enough Info (mapped to binary 0/1).
≈ 20 000 combined examples after cleaning.
Evaluation Results
| Metric | Base Model (M₀) | Fine-Tuned (M₁) | Δ Change |
|---|---|---|---|
| Accuracy | 0.52 | 0.80 | +0.28 |
| F1 Score | 0.00 | 0.79 | +0.79 |
| Eval Loss | 0.69 → 0.35 | ↓ |
Confusion Matrix
| Pred Factual | Pred Non-Factual | |
|---|---|---|
| True Factual | 838 | 205 |
| True Non-Factual | 204 | 753 |
Intended Use
Acts as a truth-checking critic for large-language-model outputs.
Input
Free-form English text (e.g., an LLM response or claim)
Output
{
"label": "non-factual",
"confidence": 0.81,
"probs": { "supported": 0.19, "non-factual": 0.81 }
}
Out of Scope
- Non-English text
- Numerical facts requiring external databases (e.g., live statistics or financial data)
- Ethical or opinion-based classification tasks
Bias · Risks · Limitations
- Trained only on English corpora; may mis-score culturally specific or multilingual statements.
- Can misclassify sarcasm, humor, or figurative speech as “non-factual.”
- Should be used as one critic in a multi-agent safeguard system, not as a standalone truth detector.
Usage Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, torch.nn.functional as F
repo = "ajithbondili/deberta-v3-factuality-small"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)
text = "The Moon is made of cheese."
inputs = tok(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
logits = model(**inputs).logits
probs = F.softmax(logits, dim=-1)
label = torch.argmax(probs).item()
print({"label": label, "probs": probs.tolist()})
Citation
@software{bondili_2025_factuality, author = {Ajith Bondili}, title = {DeBERTa-v3-Small Factuality / Misinformation Classifier}, year = {2025}, url = {https://huggingface.co/ajith-bondili/deberta-v3-factuality-small} }
- Downloads last month
- -