Hallucination Detector (DeBERTa-v3-base)
A fine-tuned DeBERTa-v3-base model for detecting hallucinations in LLM-generated text.
Model Description
This model classifies whether an LLM-generated response is factual or hallucinated given a knowledge context. It was fine-tuned on the HaluEval benchmark.
- Base model: microsoft/deberta-v3-base (184M parameters)
- Task: Binary classification (Factual vs Hallucinated)
- Training data: HaluEval (21,000 samples across QA, Dialogue, Summarization)
Performance
| Metric | Score |
|---|---|
| Accuracy | 0.9127 |
| Precision | 0.8819 |
| Recall | 0.9505 |
| F1 Score | 0.9149 |
| AUROC | 0.9771 |
Performance by Task
| Task | F1 Score |
|---|---|
| QA | 0.97 |
| Summarization | 0.96 |
| Dialogue | 0.82 |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model
tokenizer = AutoTokenizer.from_pretrained("varunteja99/hallucination-detector-deberta")
model = AutoModelForSequenceClassification.from_pretrained("varunteja99/hallucination-detector-deberta")
# Prepare input
text = """Knowledge: The Eiffel Tower is located in Paris, France.
Question: Where is the Eiffel Tower?
Answer: The Eiffel Tower is in London."""
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Predict
with torch.no_grad():
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()
print("Hallucinated" if prediction == 1 else "Factual")
# Output: Hallucinated
Input Format
The model expects input in the following format:
Knowledge: [relevant context/facts]
Question: [the query or prompt]
Answer: [the LLM-generated response to verify]
Labels
| ID | Label | Description |
|---|---|---|
| 0 | Factual | Response is supported by knowledge |
| 1 | Hallucinated | Response contradicts or is unsupported |
Training Details
- Epochs: 3
- Learning rate: 2e-5
- Batch size: 8
- Warmup steps: 788
- Precision: float32 (for training stability)
Citation
@misc{chundru2026hallucination,
author = {Chundru, Varun and Biswas, Debasmita},
title = {Domain-Specific Hallucination Detection in Large Language Models},
year = {2026},
publisher = {GitHub},
url = {https://github.com/varunteja99/hallucination-detection-nlp}
}
Acknowledgments
- Course: CS 593 NLP, Purdue University, Spring 2026
- Dataset: HaluEval (Li et al., EMNLP 2023)
- Downloads last month
- 21
Evaluation results
- accuracy on HaluEvalself-reported0.913
- f1 on HaluEvalself-reported0.915
- precision on HaluEvalself-reported0.882
- recall on HaluEvalself-reported0.951