On the Challenges and Opportunities of Learned Sparse Retrieval for Code Paper • 2603.22008 • Published Mar 23 • 3
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling Feb 12 • 56
view article Article How to Ground a Korean AI Agent in Real Demographics with Synthetic Personas 15 days ago • 25
view article Article DenseOn with the LateOn: Open State-of-the-Art Single and Multi-Vector Models 15 days ago • 36
view article Article Introducing RTEB: A New Standard for Retrieval Evaluation +4 Oct 1, 2025 • 143
view article Article Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers 20 days ago • 68
KoViDoRe Benchmark (BEIR) v2 Collection Korean Vision Document Retrieval Benchmark • 4 items • Updated Mar 2 • 6
Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval Paper • 2604.04734 • Published about 1 month ago • 12
view article Article Multimodal Embedding & Reranker Models with Sentence Transformers 27 days ago • 57
view article Article How I contributed a new model to the Transformers library using Codex Mar 30 • 50
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval Paper • 2603.12824 • Published Mar 13 • 5
view article Article A framework and leaderboard for Retrieval Pipelines evaluation on ViDoRe v3 Feb 27 • 12
ModernVBERT: Towards Smaller Visual Document Retrievers Paper • 2510.01149 • Published Oct 1, 2025 • 33
VisionDocumentRetrieval Datasets Collection Datasets for vision document retrieval (VDR) • 19 items • Updated Mar 17 • 11
view article Article ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models Oct 18, 2024 • 21
SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents Paper • 2511.04910 • Published Nov 7, 2025 • 1
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated Feb 26 • 96