Collections
Discover the best community collections!
Collections including paper arxiv:2605.08063
-
Self-Distilled RLVR
Paper • 2604.03128 • Published • 171 -
Token Warping Helps MLLMs Look from Nearby Viewpoints
Paper • 2604.02870 • Published • 34 -
A Simple Baseline for Streaming Video Understanding
Paper • 2604.02317 • Published • 73 -
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers
Paper • 2605.06169 • Published • 111
-
CoLLM: A Large Language Model for Composed Image Retrieval
Paper • 2503.19910 • Published • 15 -
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Paper • 2503.21541 • Published • 1 -
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration
Paper • 2504.03536 • Published • 13 -
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Paper • 2504.04842 • Published • 35
-
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
Paper • 2603.12180 • Published • 65 -
Flow-OPD: On-Policy Distillation for Flow Matching Models
Paper • 2605.08063 • Published • 81 -
Normalizing Trajectory Models
Paper • 2605.08078 • Published • 10 -
STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation
Paper • 2605.08029 • Published • 10
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 178 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 71 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 77 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 56 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 22 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
-
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
Paper • 2603.12180 • Published • 65 -
Flow-OPD: On-Policy Distillation for Flow Matching Models
Paper • 2605.08063 • Published • 81 -
Normalizing Trajectory Models
Paper • 2605.08078 • Published • 10 -
STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation
Paper • 2605.08029 • Published • 10
-
Self-Distilled RLVR
Paper • 2604.03128 • Published • 171 -
Token Warping Helps MLLMs Look from Nearby Viewpoints
Paper • 2604.02870 • Published • 34 -
A Simple Baseline for Streaming Video Understanding
Paper • 2604.02317 • Published • 73 -
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers
Paper • 2605.06169 • Published • 111
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 178 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 71 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
CoLLM: A Large Language Model for Composed Image Retrieval
Paper • 2503.19910 • Published • 15 -
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Paper • 2503.21541 • Published • 1 -
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration
Paper • 2504.03536 • Published • 13 -
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Paper • 2504.04842 • Published • 35
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 77 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 56 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 22 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34