zELO: ELO-inspired Training Method for Rerankers and Embedding Models Paper • 2509.12541 • Published Sep 16, 2025 • 9
view article Article **ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models?** 22 days ago • 18
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math Paper • 2602.06291 • Published Feb 6 • 23
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 164
NanoBEIR datasets Collection These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 16 items • Updated 11 days ago • 14
Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model Paper • 2507.05513 • Published Jul 7, 2025 • 1
KoViDoRe Benchmark (BEIR) v2 Collection Korean Vision Document Retrieval Benchmark • 4 items • Updated 11 days ago • 5
view article Article Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries Dec 22, 2025 • 9
KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model Paper • 2501.01028 • Published Jan 2, 2025 • 19
Tarka Embed V1 Collection Efficient DFKD embeddings for language understanding • 5 items • Updated Dec 17, 2025 • 6
Black-Box On-Policy Distillation of Large Language Models Paper • 2511.10643 • Published Nov 13, 2025 • 52
Preserving Multilingual Quality While Tuning Query Encoder on English Only Paper • 2407.00923 • Published Jul 1, 2024 • 1
Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks Paper • 2511.07025 • Published Nov 10, 2025 • 14