Factuality-Driven DPR Question Encoder

This model is the fine-tuned question encoder used in the thesis: “Factuality-Driven Hybrid Re-Ranking for Retrieval-Augmented Generation”
(National Institute of Technology Hamirpur, 2025)

It is trained on HotpotQA with positive (question, supporting passage) supervision to improve semantic retrieval, particularly for factual grounding in Retrieval-Augmented Generation (RAG).


Training Details

  • Base Model: facebook/dpr-question_encoder-single-nq-base
  • Dataset: HotpotQA (subset of train: 3,000 samples)
  • Training Objective: Contrastive loss (positive passage alignment)
  • Epochs: 2
  • Effective Batch Size: 128 (gradient accumulation)
  • Learning Rate: 2e-5
  • Hardware: Kaggle T4 GPU
  • Reproducibility: Fully deterministic (SEED = 42)

Performance (DPR Recall@K on 1,000 HotpotQA samples)

K Recall@K
1 0.5720
5 0.7420
10 0.7840
20 0.8090

Usage Example

from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer

tokenizer = DPRQuestionEncoderTokenizer.from_pretrained("Anshul017/factuality-dpr-question")
model = DPRQuestionEncoder.from_pretrained("Anshul017/factuality-dpr-question")

query = "Where was Marie Curie born?"
enc = tokenizer(query, return_tensors="pt")
vec = model(**enc).pooler_output  # [1, 768]

Author

Anshul Thakur, M.Tech CSE, NIT Hamirpur

Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support