Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher Paper • 2606.01000 • Published about 1 month ago • 6
ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models Paper • 2510.16928 • Published Oct 19, 2025 • 4
Genomic Next-Token Predictors are In-Context Learners Paper • 2511.12797 • Published Nov 16, 2025 • 8
Genomic Next-Token Predictors are In-Context Learners Paper • 2511.12797 • Published Nov 16, 2025 • 8 • 2
SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains Paper • 2507.07229 • Published Jul 9, 2025 • 11
World-in-World: World Models in a Closed-Loop World Paper • 2510.18135 • Published Oct 20, 2025 • 78
World-in-World: World Models in a Closed-Loop World Paper • 2510.18135 • Published Oct 20, 2025 • 78
MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification Paper • 2505.18452 • Published May 24, 2025 • 4
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published Oct 9, 2025 • 41