Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression Paper • 2512.05081 • Published 8 days ago • 30
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 8 days ago • 166
UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits Paper • 2512.02790 • Published 11 days ago • 3
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework Paper • 2512.03041 • Published 10 days ago • 62
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published 10 days ago • 48
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published 18 days ago • 30
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published 23 days ago • 92
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper • 2511.14993 • Published 23 days ago • 222
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30 • 115
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published Oct 23 • 18
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10 • 50
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published Oct 20 • 67
Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset Paper • 2510.16258 • Published Oct 17 • 7