PyVision-RL: Forging Open Agentic Vision Models via RL Paper • 2602.20739 • Published 16 days ago • 29
PyVision-RL: Forging Open Agentic Vision Models via RL Paper • 2602.20739 • Published 16 days ago • 29
Unimedvl: Unifying Medical Multimodal Understanding And Generation Through Observation-Knowledge-Analysis Paper • 2510.15710 • Published Oct 17, 2025 • 7
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark Paper • 2402.02242 • Published Feb 3, 2024
dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models Paper • 2512.19433 • Published Dec 22, 2025 • 3
Lumina-DiMOO Family Collection Open-Sourced Large Diffusion Language Model for Multi-Modal Generation and Understanding • 3 items • Updated 10 days ago • 5
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 244
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield Paper • 2511.22677 • Published Nov 27, 2025 • 35
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 244
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield Paper • 2511.22677 • Published Nov 27, 2025 • 35
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision Paper • 2504.04903 • Published Apr 7, 2025
Factuality Matters: When Image Generation and Editing Meet Structured Visuals Paper • 2510.05091 • Published Oct 6, 2025 • 20
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Paper • 2510.06308 • Published Oct 7, 2025 • 55
PICABench: How Far Are We from Physically Realistic Image Editing? Paper • 2510.17681 • Published Oct 20, 2025 • 64