Towards Semantic Equivalence of Tokenization in Multimodal LLM Paper • 2406.05127 • Published Jun 7, 2024
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection Paper • 2505.18660 • Published May 24, 2025 • 2
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models Paper • 2505.24164 • Published May 30, 2025
SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control Paper • 2505.19463 • Published May 26, 2025
MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation Paper • 2510.00647 • Published Oct 1, 2025
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation Paper • 2511.20563 • Published Nov 25, 2025 • 1
Training LLMs with LogicReward for Faithful and Rigorous Reasoning Paper • 2512.18196 • Published Dec 20, 2025
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Paper • 2602.19163 • Published 19 days ago • 14
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought Paper • 2505.15510 • Published May 21, 2025
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published Dec 28, 2025 • 20
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11, 2025 • 38
Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding Paper • 2509.11866 • Published Sep 15, 2025 • 2
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models Paper • 2508.12081 • Published Aug 16, 2025
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7, 2025 • 82
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published Apr 17, 2025 • 20
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published Mar 30, 2025 • 57
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31, 2025 • 76