Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation Paper • 2604.08627 • Published Apr 9 • 2
view article Article Training and Finetuning Reranker Models with Sentence Transformers tomaarsen • Mar 26, 2025 • 195
Revise: A Framework for Revising OCRed text in Practical Information Systems with Data Contamination Strategy Paper • 2604.08115 • Published Apr 9 • 1
MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal Paper • 2605.07249 • Published May 8 • 3
SemBridge: Language Transfer in Sparse Encoders via Multilingual Semantic Bridges Paper • 2605.26002 • Published May 25 • 4
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published Apr 11 • 82
Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval Paper • 2604.04734 • Published Apr 6 • 14
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment Paper • 2604.05684 • Published Apr 7 • 9
ConTEB evaluation datasets Collection Evaluation datasets of the ConTEB benchmark. Use "test" split where available, otherwise "validation", otherwise "train". • 8 items • Updated Jun 2, 2025 • 3
view article Article Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries sionic-ai • Dec 22, 2025 • 11
🦢SWIM-IR Dataset [NAACL'24] Collection 29 million Synthetic Wikipedia-based Multilingual Retrieval Training Pairs. • 4 items • Updated Mar 31, 2025 • 8
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 173