Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published 9 days ago • 55
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities Paper • 2602.05281 • Published 19 days ago • 15
Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models Paper • 2512.00590 • Published Nov 29, 2025 • 49
Sentence-Anchored Gist Compression for Long-Context LLMs Paper • 2511.08128 • Published Nov 11, 2025 • 1
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization Paper • 2510.25616 • Published Oct 29, 2025 • 105
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published Oct 6, 2025 • 115
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling Paper • 2508.16745 • Published Aug 22, 2025 • 29
HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds Paper • 2508.12782 • Published Aug 18, 2025 • 25
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5, 2025 • 133
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts Paper • 2506.05229 • Published Jun 5, 2025 • 38
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA Paper • 2505.21115 • Published May 27, 2025 • 140
AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment Paper • 2506.04089 • Published Jun 4, 2025 • 47
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26, 2025 • 93
Exploring the Latent Capacity of LLMs for One-Step Text Generation Paper • 2505.21189 • Published May 27, 2025 • 61
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published Apr 8, 2025 • 110
A Comprehensive Survey on Long Context Language Modeling Paper • 2503.17407 • Published Mar 20, 2025 • 49
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12, 2025 • 75