DEJIMA: A Novel Large-scale Japanese Dataset for Image Captioning and Visual Question Answering Paper • 2512.00773 • Published Nov 30, 2025 • 1
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model Paper • 2602.21818 • Published 16 days ago • 52
Qwen3.5 Collection Qwen3.5 is Qwen's new model family including Qwen3.5 Small: 0.8B, 2B, 4B, 9B and Qwen3.5 Medium: 35B-A3B, 27B, 122B-A10B and 397B-A17B. • 25 items • Updated 1 day ago • 112
BitDance Collection BitDance: Open-source autoregressive model with binary visual tokens. A research project for building powerful multimodal autoregressive model. • 10 items • Updated 11 days ago • 11
NEST-Ja Collection Japanese speech self-supervised learning model developed by SB Intuitions. • 2 items • Updated about 1 month ago • 1
Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning Paper • 2601.03872 • Published Jan 7 • 43
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published Dec 26, 2025 • 60
LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper • 2512.15745 • Published Dec 10, 2025 • 87
DeContext as Defense: Safe Image Editing in Diffusion Transformers Paper • 2512.16625 • Published Dec 18, 2025 • 25
IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning Paper • 2512.15635 • Published Dec 17, 2025 • 20
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published Dec 4, 2025 • 174