ViQ: Text-Aligned Visual Quantized Representations at Any Resolution Paper • 2606.27313 • Published 1 day ago • 27
Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack Paper • 2606.14409 • Published 14 days ago • 15
GEM: Generative Supervision Helps Embodied Intelligence Paper • 2605.28548 • Published about 1 month ago • 35
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models Paper • 2605.20873 • Published May 20 • 44
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 184
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning Paper • 2603.26653 • Published Mar 27 • 18
Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2603.18118 • Published Mar 18 • 12
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published Nov 19, 2025 • 98
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing Paper • 2509.24900 • Published Sep 29, 2025 • 54
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark Paper • 2509.24897 • Published Sep 29, 2025 • 46
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving Paper • 2507.23726 • Published Jul 31, 2025 • 116
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs Paper • 2506.05344 • Published Jun 5, 2025 • 17
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Paper • 2505.04512 • Published May 7, 2025 • 36
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published Feb 6, 2025 • 29
Oryx Collection Oryx: One Multi-Modal LLM for On-Demand Spatial-Temporal Understanding • 6 items • Updated Dec 11, 2024 • 16
Oryx-1.5 Collection Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution • 4 items • Updated Jan 15, 2025 • 5
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 148
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 97
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 26