GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents Paper • 2606.24551 • Published 10 days ago • 28
Semantic Browsing: Controllable Diversity for Image Generation Paper • 2606.23679 • Published 10 days ago • 20
Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models Paper • 2606.25041 • Published 9 days ago • 110
SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer Paper • 2605.30409 • Published May 28 • 42
Representation Forcing for Bottleneck-Free Unified Multimodal Models Paper • 2605.31604 • Published May 29 • 63
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards Paper • 2605.31584 • Published May 29 • 43
Meta-CoT: Enhancing Granularity and Generalization in Image Editing Paper • 2604.24625 • Published Apr 27 • 26
FrankenMotion: Part-level Human Motion Generation and Composition Paper • 2601.10909 • Published Jan 15 • 19
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published Jan 6 • 104
MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization Paper • 2601.01554 • Published Jan 4 • 62
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models Paper • 2512.24618 • Published Dec 31, 2025 • 155
ProEdit: Inversion-based Editing From Prompts Done Right Paper • 2512.22118 • Published Dec 26, 2025 • 19
Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding Paper • 2512.17220 • Published Dec 19, 2025 • 115
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming Paper • 2512.21338 • Published Dec 24, 2025 • 23
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published Dec 23, 2025 • 52
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 97