Dense Reward for Multi-View 3D Reasoning with Global Maps and Local Views Paper • 2606.23557 • Published 6 days ago • 5
Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling Paper • 2606.02578 • Published 27 days ago • 6
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 128
MLLM as a UI Judge: Benchmarking Multimodal LLMs for Predicting Human Perception of User Interfaces Paper • 2510.08783 • Published Oct 9, 2025 • 5
Video models are zero-shot learners and reasoners Paper • 2509.20328 • Published Sep 24, 2025 • 101
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8, 2025 • 57
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 15 items • Updated Mar 10 • 674
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models Paper • 2507.08800 • Published Jul 11, 2025 • 81
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27, 2025 • 110
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation Paper • 2407.11394 • Published Jul 16, 2024 • 12