Read 2026
updated
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published • 318
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper
• 2512.23988
• Published • 19
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper
• 2512.25075
• Published • 15
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper
• 2512.24176
• Published • 8
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper
• 2512.24165
• Published • 52
AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction
Paper
• 2601.00796
• Published • 32
Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning
Paper
• 2512.24146
• Published • 14
Paper
• 2601.00417
• Published • 34
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper
• 2601.03233
• Published • 172
SOP: A Scalable Online Post-Training System for Vision-Language-Action Models
Paper
• 2601.03044
• Published • 28
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
• 2601.05242
• Published • 229
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models
Paper
• 2601.03425
• Published • 16
RelayLLM: Efficient Reasoning via Collaborative Decoding
Paper
• 2601.05167
• Published • 31
AgentOCR: Reimagining Agent History via Optical Self-Compression
Paper
• 2601.04786
• Published • 30
Over-Searching in Search-Augmented Large Language Models
Paper
• 2601.05503
• Published • 7
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
Paper
• 2601.07226
• Published • 33
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models
Paper
• 2601.07351
• Published • 26
Dr. Zero: Self-Evolving Search Agents without Training Data
Paper
• 2601.07055
• Published • 22
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale
Paper
• 2601.08225
• Published • 53
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents
Paper
• 2601.07264
• Published • 24
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Paper
• 2601.08303
• Published • 19
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
Paper
• 2601.09708
• Published • 54
Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments
Paper
• 2601.01075
• Published • 6
The AI Hippocampus: How Far are We From Human Memory?
Paper
• 2601.09113
• Published • 6
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper
• 2601.08763
• Published • 149
Alterbute: Editing Intrinsic Attributes of Objects in Images
Paper
• 2601.10714
• Published • 31
Transition Matching Distillation for Fast Video Generation
Paper
• 2601.09881
• Published • 33
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text
Paper
• 2601.10355
• Published • 39
Language of Thought Shapes Output Diversity in Large Language Models
Paper
• 2601.11227
• Published • 9
More Images, More Problems? A Controlled Analysis of VLM Failure Modes
Paper
• 2601.07812
• Published • 6
Toward Efficient Agents: Memory, Tool learning, and Planning
Paper
• 2601.14192
• Published • 57
Agentic Reasoning for Large Language Models
Paper
• 2601.12538
• Published • 202
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper
• 2601.16208
• Published • 55
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
Paper
• 2601.15224
• Published • 12
360Anything: Geometry-Free Lifting of Images and Videos to 360°
Paper
• 2601.16192
• Published • 9
Agentic Uncertainty Quantification
Paper
• 2601.15703
• Published • 9
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences
Paper
• 2601.07251
• Published • 11
Paper
• 2601.17237
• Published • 10
Agentic Very Long Video Understanding
Paper
• 2601.18157
• Published • 19
Shaping capabilities with token-level data filtering
Paper
• 2601.21571
• Published • 27
KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices
Paper
• 2601.21579
• Published • 6
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text
Paper
• 2601.22975
• Published • 110
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
Paper
• 2601.21468
• Published • 25
LMK > CLS: Landmark Pooling for Dense Embeddings
Paper
• 2601.21525
• Published • 5
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis
Paper
• 2602.03139
• Published • 45
Rethinking the Trust Region in LLM Reinforcement Learning
Paper
• 2602.04879
• Published • 37
Protein Autoregressive Modeling via Multiscale Structure Generation
Paper
• 2602.04883
• Published • 3
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration
Paper
• 2602.01734
• Published • 32
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
Paper
• 2602.08222
• Published • 283
Towards Agentic Intelligence for Materials Science
Paper
• 2602.00169
• Published • 47
Reliable and Responsible Foundation Models: A Comprehensive Survey
Paper
• 2602.08145
• Published • 8
Col-Bandit: Zero-Shot Query-Time Pruning for Late-Interaction Retrieval
Paper
• 2602.02827
• Published • 3
Stable Velocity: A Variance Perspective on Flow Matching
Paper
• 2602.05435
• Published • 3
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context
Paper
• 2602.12108
• Published • 13
Free(): Learning to Forget in Malloc-Only Reasoning Models
Paper
• 2602.08030
• Published • 6
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models
Paper
• 2602.10179
• Published • 6
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
Paper
• 2602.12617
• Published • 20
Experiential Reinforcement Learning
Paper
• 2602.13949
• Published • 71
SPILLage: Agentic Oversharing on the Web
Paper
• 2602.13516
• Published
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
Paper
• 2602.14689
• Published • 1
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers
Paper
• 2602.15322
• Published • 10
Visual Persuasion: What Influences Decisions of Vision-Language Models?
Paper
• 2602.15278
• Published • 3
The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems
Paper
• 2602.15382
• Published • 2
Causal-JEPA: Learning World Models through Object-Level Latent Interventions
Paper
• 2602.11389
• Published • 7
Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality
Paper
• 2602.14080
• Published • 20
Multi-agent cooperation through in-context co-player inference
Paper
• 2602.16301
• Published • 24
Reinforced Fast Weights with Next-Sequence Prediction
Paper
• 2602.16704
• Published • 13
World Action Models are Zero-shot Policies
Paper
• 2602.15922
• Published • 15
Unified Latents (UL): How to train your latents
Paper
• 2602.17270
• Published • 58
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers
Paper
• 2602.16968
• Published • 12
NeST: Neuron Selective Tuning for LLM Safety
Paper
• 2602.16835
• Published • 1
CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing
Paper
• 2602.15823
• Published • 3
Does Your Reasoning Model Implicitly Know When to Stop Thinking?
Paper
• 2602.08354
• Published • 262
Spanning the Visual Analogy Space with a Weight Basis of LoRAs
Paper
• 2602.15727
• Published • 14
VLANeXt: Recipes for Building Strong VLA Models
Paper
• 2602.18532
• Published • 52
On Data Engineering for Scaling LLM Terminal Capabilities
Paper
• 2602.21193
• Published • 99
Test-Time Training with KV Binding Is Secretly Linear Attention
Paper
• 2602.21204
• Published • 30
One-step Language Modeling via Continuous Denoising
Paper
• 2602.16813
• Published • 4
Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking
Paper
• 2602.21196
• Published • 5
The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum
Paper
• 2602.21185
• Published • 3
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
Paper
• 2602.21548
• Published • 47
Image Generation with a Sphere Encoder
Paper
• 2602.15030
• Published • 15
From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors
Paper
• 2602.21778
• Published • 14
SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models
Paper
• 2602.18993
• Published • 4
Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting
Paper
• 2602.20933
• Published • 4
VGG-T^3: Offline Feed-Forward 3D Reconstruction at Scale
Paper
• 2602.23361
• Published • 14
Causal Motion Diffusion Models for Autoregressive Motion Generation
Paper
• 2602.22594
• Published • 7
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning
Paper
• 2602.23258
• Published • 28
Mode Seeking meets Mean Seeking for Fast Long Video Generation
Paper
• 2602.24289
• Published • 41
LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding
Paper
• 2602.23881
• Published • 18
How to Take a Memorable Picture? Empowering Users with Actionable Feedback
Paper
• 2602.21877
• Published • 16
Spectral Condition for μP under Width-Depth Scaling
Paper
• 2603.00541
• Published • 15
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Paper
• 2603.03276
• Published • 100
Spilled Energy in Large Language Models
Paper
• 2602.18671
• Published • 11
Helios: Real Real-Time Long Video Generation Model
Paper
• 2603.04379
• Published • 174
RealWonder: Real-Time Physical Action-Conditioned Video Generation
Paper
• 2603.05449
• Published • 12
KARL: Knowledge Agents via Reinforcement Learning
Paper
• 2603.05218
• Published • 6
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling
Paper
• 2603.04553
• Published • 3
Progressive Residual Warmup for Language Model Pretraining
Paper
• 2603.05369
• Published • 36
Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey
Paper
• 2603.04445
• Published • 4
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data
Paper
• 2603.09206
• Published • 52
Lost in Backpropagation: The LM Head is a Gradient Bottleneck
Paper
• 2603.10145
• Published • 11
Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers
Paper
• 2603.10744
• Published • 7
Hindsight Credit Assignment for Long-Horizon LLM Agents
Paper
• 2603.08754
• Published • 5
HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios
Paper
• 2603.11975
• Published • 11
Grounding World Simulation Models in a Real-World Metropolis
Paper
• 2603.15583
• Published • 145
Paper
• 2603.15031
• Published • 150
WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation
Paper
• 2603.15132
• Published • 35
LoST: Level of Semantics Tokenization for 3D Shapes
Paper
• 2603.17995
• Published • 29
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning
Paper
• 2603.14482
• Published • 15
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
Paper
• 2603.18815
• Published • 10
COT-FM: Cluster-wise Optimal Transport Flow Matching
Paper
• 2603.13395
• Published • 1
Matryoshka Gaussian Splatting
Paper
• 2603.19234
• Published • 8