LMMs-Lab

community

https://www.lmms-lab.com/

lmmslab

EvolvingLMMs-Lab

Activity Feed

AI & ML interests

Feeling and building the multimodal intelligence.

Recent Activity

Paranioar authored a paper 6 days ago

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

xiangan authored a paper 27 days ago

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

xiangan authored a paper 27 days ago

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

View all activity

Papers

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

View all Papers

lmms-lab 's collections 19

LLaVA-OneVision-2

lmms-lab-encoder/LLaVA-OneVision-2-8B-Instruct

Image-Text-to-Text • 9B • Updated 22 days ago • 16.7k • 12
mvp-lab/LLaVA-OneVision-2-Data

Viewer • Updated May 11 • 24 • 171k • 30

LongVT

Runtime error

Agents

3

LongVT Demo

🎬

3

Analyze long videos and answer questions about them
longvideotool/LongVT-RL

Video-Text-to-Text • Updated Dec 4, 2025 • 8 • 3
longvideotool/LongVT-SFT

Video-Text-to-Text • Updated Dec 4, 2025 • 14 • 1
longvideotool/LongVT-RFT

Video-Text-to-Text • Updated Dec 4, 2025 • 712 • 1

LLaVA-OneVision-1.5

https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5

mvp-lab/LLaVA-OneVision-1.5-Instruct-Data

Viewer • Updated Nov 21, 2025 • 21.9M • 37k • 76
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M

Viewer • Updated Nov 24, 2025 • 91.5M • 496k • 72
lmms-lab/LLaVA-OneVision-1.5-8B-Instruct

Image-Text-to-Text • 9B • Updated Oct 21, 2025 • 5.45k • 64
lmms-lab/LLaVA-OneVision-1.5-4B-Instruct

Image-Text-to-Text • 5B • Updated Feb 6 • 982 • 18

MMSearch-R1

MMSearch-R1 is a solution designed to train LMMs to perform on-demand multimodal search in real-world environment.

lmms-lab/MMSearch-R1-7B-0807

8B • Updated Aug 7, 2025 • 3
lmms-lab/MMSearch-R1-7B

8B • Updated Jul 30, 2025 • 36 • 10
lmms-lab/FVQA

Viewer • Updated Aug 9, 2025 • 6.66k • 352 • 9
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25, 2025 • 65

EgoLife

CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5, 2025 • 46
Runtime error

Agents

14

EgoGPT

👁

14

Analyze video to describe actions and transcribe audio
lmms-lab/EgoIT-99K

Viewer • Updated Mar 7, 2025 • 199k • 5.15k • 9
lmms-lab/EgoLife

Viewer • Updated Mar 13, 2025 • 32k • 43.2k • 19

Multimodal-SAE

The collection of the sae that hooked on llava

Build error

Agents

9

Multimodal SAE

💬

9

Demo for Multimodal-SAE
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22, 2024 • 19
lmms-lab/llava-sae-explanations-5k

Viewer • Updated Nov 22, 2024 • 9.8k • 150 • 6
lmms-lab/llama3-llava-next-8b-hf-sae-131k

Updated Nov 26, 2024 • 12 • 8

LLaVA-Video

Models focus on video understanding (previously known as LLaVA-NeXT-Video).

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 41
lmms-lab/LLaVA-Video-178K

Viewer • Updated Oct 11, 2024 • 1.63M • 20.7k • 197
lmms-lab/LLaVA-Video-7B-Qwen2

Video-Text-to-Text • 8B • Updated Oct 25, 2024 • 21.2k • 127
lmms-lab/LLaVA-Video-72B-Qwen2

Text Generation • 73B • Updated Oct 25, 2024 • 104 • 21

LMMs-Eval

Dataset Collection of LMMs-Eval

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
lmms-lab/VQAv2

Viewer • Updated Jan 26, 2024 • 770k • 20.4k • 36
lmms-lab/MME

Viewer • Updated Dec 23, 2023 • 2.37k • 28.3k • 35
lmms-lab/DocVQA

Viewer • Updated Apr 18, 2024 • 16.6k • 30.3k • 84

LLaVA-Next-Interleave

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 42
lmms-lab/llava-next-interleave-qwen-7b

Text Generation • 8B • Updated Jul 24, 2024 • 149 • 27
lmms-lab/llava-next-interleave-qwen-7b-dpo

Text Generation • 8B • Updated Jul 12, 2024 • 18 • 12
lmms-lab/M4-Instruct-Data

Updated Jul 21, 2024 • 614 • 79

LMMs-Eval-Lite

Making Lite version of the dataset to accelerate holistic evaluation during model development!

lmms-lab/LMMs-Eval-Lite

Viewer • Updated Jul 4, 2024 • 8.5k • 12.7k • 7
lmms-lab/llava-bench-in-the-wild

Viewer • Updated Mar 8, 2024 • 60 • 4.4k • 10
lmms-lab/CMMMU

Viewer • Updated Mar 8, 2024 • 12k • 683 • 4
lmms-lab/MMMU

Viewer • Updated Mar 8, 2024 • 11.6k • 33.9k • 7

OneVision-Encoder

HEVC-Style Vision Transformer

lmms-lab-encoder/onevision-encoder-large

0.3B • Updated Feb 5 • 470 • 14
lmms-lab-encoder/onevision-encoder-large-lang

Image Feature Extraction • 0.3B • Updated 26 days ago • 250 • 8

OpenMMReasoner

OpenMMReasoner/OpenMMReasoner-ColdStart

Image-Text-to-Text • 8B • Updated Dec 30, 2025 • 48 • 3
OpenMMReasoner/OpenMMReasoner-RL

Image-Text-to-Text • 8B • Updated Dec 30, 2025 • 32 • 17
OpenMMReasoner/OpenMMReasoner-SFT-874K

Viewer • Updated Dec 30, 2025 • 874k • 167 • 8
OpenMMReasoner/OpenMMReasoner-RL-74K

Viewer • Updated Nov 25, 2025 • 74.7k • 369 • 10

LLaVA-Critic-R1

lmms-lab/LLaVA-Critic-R1-7B

8B • Updated Jul 19, 2025 • 27
lmms-lab/LLaVA-Critic-R1-7B-Plus-Qwen

8B • Updated Jul 26, 2025 • 17 • 5
lmms-lab/LLaVA-Critic-R1-7B-Plus-Mimo

8B • Updated Aug 28, 2025 • 7 • 1
lmms-lab/LLaVA-Critic-R1-7B-LLaMA32v

11B • Updated Aug 28, 2025 • 2

Aero-1-Audio

Runtime error

Agents

43

Aero 1 Audio Demo

💬

43

Demo for Aero-1-Audio
lmms-lab/Aero-1-Audio

Text Generation • 2B • Updated Jun 7, 2025 • 203 • 91

VideoMMMU

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23, 2025 • 24
lmms-lab/VideoMMMU

Viewer • Updated May 5, 2025 • 900 • 1.81k • 15

LLaVA-Critic

as a general evaluator for assessing model performance

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3, 2024 • 37
lmms-lab/llava-critic-7b

8B • Updated Oct 4, 2024 • 228 • 15
lmms-lab/llava-critic-72b

73B • Updated Oct 4, 2024 • 5 • 15
lmms-lab/llava-critic-113k

Viewer • Updated Oct 5, 2024 • 113k • 594 • 28

LLaVA-OneVision

a model good at arbitrary types of visual input

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61
lmms-lab/LLaVA-OneVision-Mid-Data

Viewer • Updated Aug 26, 2024 • 563k • 112 • 21
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 14.6k • 238
lmms-lab/LLaVA-NeXT-Data

Viewer • Updated Aug 30, 2024 • 779k • 4.25k • 46

LongVA

Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24, 2024 • 33
lmms-lab/LongVA-7B

Text Generation • 8B • Updated Jun 26, 2024 • 1.1k • 15
lmms-lab/LongVA-7B-DPO

Text Generation • 8B • Updated Jun 26, 2024 • 55 • 10
lmms-lab/v_niah_needles

Viewer • Updated Jun 15, 2024 • 5 • 19 • 4

LLaVA-NeXT

Some powerful image models.

lmms-lab/llava-next-110b

Text Generation • 112B • Updated May 14, 2024 • 9 • 21
lmms-lab/llava-next-72b

Text Generation • 73B • Updated Aug 22, 2024 • 46 • 14
lmms-lab/llava-next-qwen-32b

Text Generation • 33B • Updated Jul 16, 2024 • 8 • 7
lmms-lab/llama3-llava-next-8b

Text Generation • 8B • Updated Aug 17, 2024 • 2.42k • 106