ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation Paper • 2603.11421 • Published 2 days ago • 15
ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA Paper • 2603.10256 • Published 3 days ago • 14
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 15 days ago • 196
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published Feb 6 • 35
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 28 days ago • 43
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers Paper • 2602.16968 • Published 23 days ago • 12
jina-embeddings-v5-text: Task-Targeted Embedding Distillation Paper • 2602.15547 • Published 24 days ago • 26
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published 29 days ago • 20
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 29 days ago • 55
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published Feb 3 • 62
OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding Paper • 2601.09575 • Published Jan 14 • 26