|
|
--- |
|
|
language: |
|
|
- en |
|
|
- es |
|
|
- fr |
|
|
- hi |
|
|
- it |
|
|
- bn |
|
|
- gu |
|
|
- ml |
|
|
- te |
|
|
license: mit |
|
|
datasets: |
|
|
- hallucination_dataset_100k |
|
|
- SHROOM-CAP |
|
|
- LibreEval |
|
|
- FactCHD |
|
|
pipeline_tag: text-classification |
|
|
base_model: |
|
|
- FacebookAI/xlm-roberta-large |
|
|
--- |
|
|
|
|
|
# SVNIT-AGI at SHROOM-CAP 2025: Multilingual Hallucination Detection Model |
|
|
|
|
|
## Model Description |
|
|
|
|
|
The model is an XLM-RoBERTa-Large based fine-tuned model for scientific hallucination detection across 9 languages using the Huggingface transformers library. |
|
|
|
|
|
- **Developed by:** Harsh Rathva, Pruthwik Mishra, Shrikant Malviya |
|
|
- **Funded by:** Sardar Vallabhbhai National Institute of Technology, Surat |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** `xlm-roberta-large` |
|
|
- **Competition:** SHROOM-CAP 2025 Shared Task (2nd place in Gujarati) |
|
|
|
|
|
## Model Sources |
|
|
|
|
|
- **Repository:** [https://github.com/ezylopx5/SHROOM-CAP2025](https://github.com/ezylopx5/SHROOM-CAP2025) |
|
|
- **Paper:** https://arxiv.org/abs/2511.18301 |
|
|
|
|
|
## Uses |
|
|
|
|
|
The model can be directly used for detecting hallucinations in scientific text across 9 languages: |
|
|
- **Training Languages:** English (en), Spanish (es), French (fr), Hindi (hi), Italian (it) |
|
|
- **Zero-shot Languages:** Bengali (bn), Gujarati (gu), Malayalam (ml), Telugu (te) |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "Haxxsh/XLMRHallucinationDetectorSHROOMCAP" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
def detect_hallucination(text): |
|
|
"""Detect if text contains scientific hallucinations.""" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
|
|
|
label = "HALLUCINATED" if predictions[0][1] > 0.5 else "CORRECT" |
|
|
confidence = predictions[0][1].item() if label == "HALLUCINATED" else predictions[0][0].item() |
|
|
|
|
|
return {"label": label, "confidence": confidence} |
|
|
|
|
|
# Example usage |
|
|
test_texts = [ |
|
|
"The protein folding mechanism involves quantum tunneling effects at room temperature.", |
|
|
"Water boils at 100°C at standard atmospheric pressure.", |
|
|
"Einstein discovered the theory of relativity in 1905 with his paper on special relativity." |
|
|
] |
|
|
|
|
|
for text in test_texts: |
|
|
result = detect_hallucination(text) |
|
|
print(f"Text: {text}") |
|
|
print(f"Prediction: {result['label']} (confidence: {result['confidence']:.4f})\n") |
|
|
``` |
|
|
|
|
|
## Label Mapping |
|
|
- `0`: CORRECT (factually accurate scientific text) |
|
|
- `1`: HALLUCINATED (contains factual errors or fabrications) |
|
|
|
|
|
## Downstream Use |
|
|
|
|
|
Can be integrated into: |
|
|
- Scientific writing assistants |
|
|
- LLM output verification systems |
|
|
- Academic paper review tools |
|
|
- Multilingual fact-checking pipelines |
|
|
|
|
|
## Out-of-Scope Use |
|
|
- The model is specifically trained for scientific domain text |
|
|
- May not perform well on general domain hallucinations |
|
|
- Limited to the 9 languages mentioned above |
|
|
|
|
|
## Limitations |
|
|
- Performance varies across languages (best in Gujarati, competitive in others) |
|
|
- Trained primarily on scientific text, may not generalize to other domains |
|
|
- Requires domain adaptation for highly specialized scientific fields |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
- **Total Samples:** 124,821 balanced samples (50% correct, 50% hallucinated) |
|
|
- **Sources:** Unified dataset from SHROOM-CAP, hallucination_dataset_100k, LibreEval, FactCHD |
|
|
- **Languages:** 9 languages with cross-lingual transfer |
|
|
|
|
|
### Training Procedure |
|
|
- **Base Model:** XLM-RoBERTa-Large (560M parameters) |
|
|
- **Training Regime:** Full fine-tuning (not LoRA/PEFT) |
|
|
- **Training Batch Size:** 32 with gradient accumulation |
|
|
- **Learning Rate:** 2e-5 |
|
|
- **Weight Decay:** 0.01 |
|
|
- **Epochs:** 3 |
|
|
- **Sequence Length:** 256 tokens |
|
|
|
|
|
### Training Hyperparameters |
|
|
``` |
|
|
{ |
|
|
"per_device_train_batch_size": 16, |
|
|
"gradient_accumulation_steps": 2, |
|
|
"learning_rate": 2e-5, |
|
|
"num_train_epochs": 3, |
|
|
"max_seq_length": 256, |
|
|
"warmup_ratio": 0.1, |
|
|
"weight_decay": 0.01 |
|
|
} |
|
|
``` |
|
|
|
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Competition Results (SHROOM-CAP 2025) |
|
|
| Language | Rank | Factuality F1 | Fluency F1 | |
|
|
|----------|------|---------------|------------| |
|
|
| Gujarati (gu) | 🥈 2nd | 0.5107 | 0.1579 | |
|
|
| Bengali (bn) | 4th | 0.4449 | 0.2542 | |
|
|
| Hindi (hi) | 4th | 0.4906 | 0.4353 | |
|
|
| Spanish (es) | 5th | 0.4938 | 0.4607 | |
|
|
| French (fr) | 5th | 0.4771 | 0.2899 | |
|
|
| Telugu (te) | 5th | 0.4738 | 0.1474 | |
|
|
| Malayalam (ml) | 5th | 0.4704 | 0.3593 | |
|
|
| English (en) | 6th | 0.4246 | 0.4495 | |
|
|
| Italian (it) | 5th | 0.3149 | 0.4582 | |
|
|
|
|
|
### Metrics |
|
|
- **Primary:** Macro F1 Score |
|
|
- **Validation Performance:** 0.8510 F1 |
|
|
- **Competition Performance:** ~0.40-0.51 F1 (due to distribution shift) |
|
|
|
|
|
## Compute Infrastructure |
|
|
- **Hardware:** NVIDIA H200 GPU (141GB VRAM) |
|
|
- **Training Time:** 1 hour 14 minutes |
|
|
- **Framework:** PyTorch, HuggingFace Transformers |
|
|
|
|
|
## Model Size |
|
|
- **Parameters:** 560M |
|
|
- **File format:** SafeTensors |
|
|
- **Tensor type:** F32 |
|
|
|
|
|
## Acknowledgements |
|
|
- SHROOM-CAP 2025 Organizers for the shared task |
|
|
- Lightning AI for H200 GPU infrastructure |
|
|
- HuggingFace for the XLM-RoBERTa-Large model |
|
|
- All dataset contributors |
|
|
|
|
|
--- |
|
|
``` |