Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation Paper • 2606.26907 • Published 1 day ago • 11
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers Paper • 2601.17367 • Published Jan 24 • 33
AI4Research: A Survey of Artificial Intelligence for Scientific Research Paper • 2507.01903 • Published Jul 2, 2025 • 5
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14, 2025 • 305
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Paper • 2412.05939 • Published Dec 8, 2024 • 15
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 71
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 136
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18, 2024 • 80
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18, 2024 • 56
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 35
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10, 2024 • 42