Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper โข 2510.08673 โข Published Oct 9, 2025 โข 126
Qwen/Qwen2.5-VL-7B-Instruct Image-Text-to-Text โข 8B โข Updated Apr 6, 2025 โข 3.17M โข โข 1.44k
Jodi: Unification of Visual Generation and Understanding via Joint Modeling Paper โข 2505.19084 โข Published May 25, 2025 โข 20