Video
updated
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via
Blender-Oriented GPT Planning
Paper
• 2311.12631
• Published
• 14
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published
• 59
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in
One Step
Paper
• 2504.01956
• Published
• 41
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence
with Spatial Reasoning and Understanding
Paper
• 2506.23219
• Published
• 7
CriticLean: Critic-Guided Reinforcement Learning for Mathematical
Formalization
Paper
• 2507.06181
• Published
• 45
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs
More Realistic and Less Risky
Paper
• 2507.03336
• Published
• 7
GTA1: GUI Test-time Scaling Agent
Paper
• 2507.05791
• Published
• 27
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
Paper
• 2507.07136
• Published
• 40
Lumos-1: On Autoregressive Video Generation from a Unified Model
Perspective
Paper
• 2507.08801
• Published
• 31
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive
Token-Level Computation
Paper
• 2507.10524
• Published
• 72
SWE-Perf: Can Language Models Optimize Code Performance on Real-World
Repositories?
Paper
• 2507.12415
• Published
• 43
OpenCodeReasoning-II: A Simple Test Time Scaling Approach via
Self-Critique
Paper
• 2507.09075
• Published
• 16
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems
at Once
Paper
• 2507.10541
• Published
• 30
Lizard: An Efficient Linearization Framework for Large Language Models
Paper
• 2507.09025
• Published
• 19
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Paper
• 2507.08616
• Published
• 15
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and
Reasoning Modes
Paper
• 2507.11407
• Published
• 60
The Imitation Game: Turing Machine Imitator is Length Generalizable
Reasoner
Paper
• 2507.13332
• Published
• 49
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA
Optimization
Paper
• 2507.12142
• Published
• 37
FLEXITOKENS: Flexible Tokenization for Evolving Language Models
Paper
• 2507.12720
• Published
• 10
Inverse Reinforcement Learning Meets Large Language Model Post-Training:
Basics, Advances, and Opportunities
Paper
• 2507.13158
• Published
• 24
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated
Diffusion Transformers
Paper
• 2507.08422
• Published
• 36
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
• 2507.15061
• Published
• 60
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with
Regularized Score Distillation Sampling
Paper
• 2507.11061
• Published
• 37
Gaussian Splatting with Discretized SDF for Relightable Assets
Paper
• 2507.15629
• Published
• 23
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published
• 122
Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention
Paper
• 2507.17745
• Published
• 36
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published
• 68
ScreenCoder: Advancing Visual-to-Code Generation for Front-End
Automation via Modular Multimodal Agents
Paper
• 2507.22827
• Published
• 100
On the Expressiveness of Softmax Attention: A Recurrent Neural Network
Perspective
Paper
• 2507.23632
• Published
• 6
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language
Models
Paper
• 2508.00819
• Published
• 63
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent
Foundation Models Training
Paper
• 2508.00414
• Published
• 94
Qwen-Image Technical Report
Paper
• 2508.02324
• Published
• 272
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and
Outcome Reward
Paper
• 2508.03686
• Published
• 39
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
• 2508.01191
• Published
• 238
Efficient Agents: Building Effective Agents While Reducing Cost
Paper
• 2508.02694
• Published
• 86
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
• 2508.03680
• Published
• 137
CRINN: Contrastive Reinforcement Learning for Approximate Nearest
Neighbor Search
Paper
• 2508.02091
• Published
• 13
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy
Optimization
Paper
• 2508.05731
• Published
• 27
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving
Clipping Policy Optimization
Paper
• 2508.07629
• Published
• 43
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Paper
• 2508.05547
• Published
• 11
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published
• 206
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of
Deep-Research Agent
Paper
• 2508.06600
• Published
• 41
Reinforcement Learning in Vision: A Survey
Paper
• 2508.08189
• Published
• 30
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
• 2508.08221
• Published
• 50
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
Long-Term Memory
Paper
• 2508.09736
• Published
• 58
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings
and Speaks in Tokens
Paper
• 2508.05305
• Published
• 47
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Paper
• 2508.09968
• Published
• 15
A Survey on Diffusion Language Models
Paper
• 2508.10875
• Published
• 34
Quantization Meets dLLMs: A Systematic Study of Post-training
Quantization for Diffusion LLMs
Paper
• 2508.14896
• Published
• 22
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache
Rematerialization
Paper
• 2508.10395
• Published
• 42
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid
Mamba-Transformer Reasoning Model
Paper
• 2508.14444
• Published
• 43
Deep Think with Confidence
Paper
• 2508.15260
• Published
• 90
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From
Sparse Inputs without Per-Scene Optimization
Paper
• 2508.14811
• Published
• 42
UQ: Assessing Language Models on Unsolved Questions
Paper
• 2508.17580
• Published
• 15
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image
Generation
Paper
• 2508.17472
• Published
• 26
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
• 2508.18773
• Published
• 16
Autoregressive Universal Video Segmentation Model
Paper
• 2508.19242
• Published
• 29
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding
in Vision-Language-Action Policies
Paper
• 2508.20072
• Published
• 32
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Paper
• 2508.19652
• Published
• 84
SpotEdit: Evaluating Visually-Guided Image Editing Methods
Paper
• 2508.18159
• Published
• 3
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks
Paper
• 2508.15804
• Published
• 15
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
• 2508.20751
• Published
• 89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
• 2508.17445
• Published
• 80
Mixture of Contexts for Long Video Generation
Paper
• 2508.21058
• Published
• 35
VibeVoice Technical Report
Paper
• 2508.19205
• Published
• 143
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer
Use Agent with Decoupled Reinforcement Learning
Paper
• 2508.20096
• Published
• 37
InMind: Evaluating LLMs in Capturing and Applying Individual Human
Reasoning Styles
Paper
• 2508.16072
• Published
• 4
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for
General Robot Control
Paper
• 2508.21112
• Published
• 77
UItron: Foundational GUI Agent with Advanced Perception and Planning
Paper
• 2508.21767
• Published
• 12
Efficient Code Embeddings from Code Generation Models
Paper
• 2508.21290
• Published
• 19
TiKMiX: Take Data Influence into Dynamic Mixture for Language Model
Pre-training
Paper
• 2508.17677
• Published
• 14
CLIPSym: Delving into Symmetry Detection with CLIP
Paper
• 2508.14197
• Published
• 8
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn
Reinforcement Learning
Paper
• 2509.02544
• Published
• 125
Mixture of Global and Local Experts with Diffusion Transformer for
Controllable Face Generation
Paper
• 2509.00428
• Published
• 18
Symbolic Graphics Programming with Large Language Models
Paper
• 2509.05208
• Published
• 47
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion
Transformers via Explicit Correspondence
Paper
• 2509.12203
• Published
• 20
Locality in Image Diffusion Models Emerges from Data Statistics
Paper
• 2509.09672
• Published
• 13
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised
Learning in Open-World Scenarios
Paper
• 2509.09926
• Published
• 14
Single-stream Policy Optimization
Paper
• 2509.13232
• Published
• 34
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Paper
• 2509.10441
• Published
• 31
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon
Agents
Paper
• 2509.13309
• Published
• 67
Towards General Agentic Intelligence via Environment Scaling
Paper
• 2509.13311
• Published
• 72
Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video
Generation
Paper
• 2509.10687
• Published
• 7
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
Paper
• 2509.15212
• Published
• 22
AToken: A Unified Tokenizer for Vision
Paper
• 2509.14476
• Published
• 36
Qwen3-Omni Technical Report
Paper
• 2509.17765
• Published
• 149
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and
Open Resources
Paper
• 2509.21268
• Published
• 104
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
• 2509.22186
• Published
• 146
Fine-tuning Done Right in Model Editing
Paper
• 2509.22072
• Published
• 28
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published
• 146
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive
Exploration for Agentic Reinforcement Learning
Paper
• 2509.22601
• Published
• 30
Attention as a Compass: Efficient Exploration for Process-Supervised RL
in Reasoning Models
Paper
• 2509.26628
• Published
• 17
More Thought, Less Accuracy? On the Dual Nature of Reasoning in
Vision-Language Models
Paper
• 2509.25848
• Published
• 80
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget
Allocation
Paper
• 2509.25849
• Published
• 48
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening,
Speaking, and Viewing
Paper
• 2509.22651
• Published
• 23
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale
Diffusion Transformer
Paper
• 2509.22414
• Published
• 22
LongCodeZip: Compress Long Context for Code Language Models
Paper
• 2510.00446
• Published
• 107
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM
Reinforcement Learning via Entropy-Guided Advantage Shaping
Paper
• 2509.21880
• Published
• 53
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large
Multimodal Models
Paper
• 2510.05034
• Published
• 51
Reactive Transformer (RxT) -- Stateful Real-Time Processing for
Event-Driven Reactive Language Models
Paper
• 2510.03561
• Published
• 25
Free Lunch Alignment of Text-to-Image Diffusion Models without
Preference Image Pairs
Paper
• 2509.25771
• Published
• 11
Why Low-Precision Transformer Training Fails: An Analysis on Flash
Attention
Paper
• 2510.04212
• Published
• 26
Efficient Intent Detection with Dual Sentence Encoders
Paper
• 2003.04807
• Published
• 2
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
• 2510.11696
• Published
• 181
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for
MLLMs
Paper
• 2510.09201
• Published
• 50
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning
and Online Reinforcement Learning
Paper
• 2510.12693
• Published
• 28
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
Paper
• 2512.19134
• Published
• 32