LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding Paper • 2404.05225 • Published Apr 8, 2024 • 2
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data Paper • 2407.12358 • Published Jul 17, 2024 • 1
Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback Paper • 2507.20766 • Published Jul 28 • 1
Interleaving Reasoning for Better Text-to-Image Generation Paper • 2509.06945 • Published Sep 8 • 14
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video? Paper • 2509.24709 • Published Sep 29 • 6
RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection Paper • 2509.26048 • Published Sep 30 • 7
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks Paper • 2510.08002 • Published Oct 9 • 23
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models Paper • 2309.16292 • Published Sep 28, 2023 • 1
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving Paper • 2311.05332 • Published Nov 9, 2023 • 13
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving Paper • 2311.05332 • Published Nov 9, 2023 • 13
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds Paper • 2306.06023 • Published Jun 9, 2023
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models Paper • 2307.07162 • Published Jul 14, 2023
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models Paper • 2307.07162 • Published Jul 14, 2023
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition Paper • 2401.11649 • Published Jan 22, 2024 • 3
OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving Paper • 2402.03830 • Published Feb 6, 2024 • 2
OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving Paper • 2402.03830 • Published Feb 6, 2024 • 2
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published Jun 12, 2024 • 31
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models Paper • 2406.11633 • Published Jun 17, 2024 • 1
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published Mar 10 • 61
TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving Paper • 2504.15780 • Published Apr 22 • 6