A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5 Paper • 2601.10527 • Published 11 days ago • 23
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Paper • 2601.05593 • Published 17 days ago • 79
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 20 days ago • 134
VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published 21 days ago • 29
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published Dec 5, 2025 • 38
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 229
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models Paper • 2511.22625 • Published Nov 27, 2025 • 47
iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation Paper • 2511.20635 • Published Nov 25, 2025 • 32
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4, 2025 • 102
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing Paper • 2510.25590 • Published Oct 29, 2025 • 28
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction Paper • 2510.22706 • Published Oct 26, 2025 • 41
WithAnyone: Towards Controllable and ID Consistent Image Generation Paper • 2510.14975 • Published Oct 16, 2025 • 85
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training Paper • 2510.11712 • Published Oct 13, 2025 • 31
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 126
Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation Paper • 2509.12815 • Published Sep 16, 2025 • 40