Factuality-Driven DPR Question Encoder
This model is the fine-tuned question encoder used in the thesis:
“Factuality-Driven Hybrid Re-Ranking for Retrieval-Augmented Generation”
(National Institute of Technology Hamirpur, 2025)
It is trained on HotpotQA with positive (question, supporting passage) supervision to improve semantic retrieval, particularly for factual grounding in Retrieval-Augmented Generation (RAG).
Training Details
- Base Model:
facebook/dpr-question_encoder-single-nq-base - Dataset: HotpotQA (subset of train: 3,000 samples)
- Training Objective: Contrastive loss (positive passage alignment)
- Epochs: 2
- Effective Batch Size: 128 (gradient accumulation)
- Learning Rate: 2e-5
- Hardware: Kaggle T4 GPU
- Reproducibility: Fully deterministic (SEED = 42)
Performance (DPR Recall@K on 1,000 HotpotQA samples)
| K | Recall@K |
|---|---|
| 1 | 0.5720 |
| 5 | 0.7420 |
| 10 | 0.7840 |
| 20 | 0.8090 |
Usage Example
from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer
tokenizer = DPRQuestionEncoderTokenizer.from_pretrained("Anshul017/factuality-dpr-question")
model = DPRQuestionEncoder.from_pretrained("Anshul017/factuality-dpr-question")
query = "Where was Marie Curie born?"
enc = tokenizer(query, return_tensors="pt")
vec = model(**enc).pooler_output # [1, 768]
Author
Anshul Thakur, M.Tech CSE, NIT Hamirpur
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support