Towards Semantic Equivalence of Tokenization in Multimodal LLM Paper ⢠2406.05127 ⢠Published Jun 7, 2024
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection Paper ⢠2505.18660 ⢠Published May 24, 2025 ⢠2
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models Paper ⢠2505.24164 ⢠Published May 30, 2025
SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control Paper ⢠2505.19463 ⢠Published May 26, 2025
MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation Paper ⢠2510.00647 ⢠Published Oct 1, 2025
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation Paper ⢠2511.20563 ⢠Published Nov 25, 2025 ⢠1
Training LLMs with LogicReward for Faithful and Rigorous Reasoning Paper ⢠2512.18196 ⢠Published Dec 20, 2025
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Paper ⢠2602.19163 ⢠Published 22 days ago ⢠14
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought Paper ⢠2505.15510 ⢠Published May 21, 2025
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web Paper ⢠2512.23044 ⢠Published Dec 28, 2025 ⢠10
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper ⢠2512.22905 ⢠Published Dec 28, 2025 ⢠20