Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 166
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 506
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic Text Generation • 8B • Updated Dec 12, 2025 • 16k • 9
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 183
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14, 2025 • 306
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published Mar 24, 2025 • 119
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published Feb 20, 2025 • 174
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20, 2025 • 108