Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation Paper • 2606.12594 • Published 16 days ago • 17
VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models Paper • 2603.06148 • Published Mar 6 • 2
SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks Paper • 2605.31433 • Published 28 days ago • 28
Learning GUI Grounding with Spatial Reasoning from Visual Feedback Paper • 2509.21552 • Published Sep 25, 2025 • 11
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent Paper • 2508.06600 • Published Aug 8, 2025 • 42
Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models Paper • 2506.06006 • Published Jun 6, 2025 • 15
Inference-Time Hyper-Scaling with KV Cache Compression Paper • 2506.05345 • Published Jun 5, 2025 • 31
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Paper • 2506.03295 • Published Jun 3, 2025 • 17
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards Paper • 2505.24760 • Published May 30, 2025 • 74
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 56
Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs Paper • 2502.05092 • Published Feb 7, 2025 • 8
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 136 items • Updated about 1 month ago • 119
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering Paper • 2410.15999 • Published Oct 21, 2024 • 20
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations Paper • 2410.18860 • Published Oct 24, 2024 • 11
FLARE: Faithful Logic-Aided Reasoning and Exploration Paper • 2410.11900 • Published Oct 14, 2024 • 4
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated Dec 23, 2025 • 310