AI & ML interests
Feeling and building the multimodal intelligence.
Recent Activity
Papers
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5
-
mvp-lab/LLaVA-OneVision-1.5-Instruct-Data
Viewer • Updated • 21.9M • 75.7k • 55 -
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M
Viewer • Updated • 91.5M • 222k • 48 -
lmms-lab/LLaVA-OneVision-1.5-8B-Instruct
Image-Text-to-Text • 9B • Updated • 13.1k • 58 -
lmms-lab/LLaVA-OneVision-1.5-4B-Instruct
Image-Text-to-Text • 5B • Updated • 3.91k • 15
MMSearch-R1 is a solution designed to train LMMs to perform on-demand multimodal search in real-world environment.
CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/
The collection of the sae that hooked on llava
Models focus on video understanding (previously known as LLaVA-NeXT-Video).
Dataset Collection of LMMs-Eval
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper • 2407.07895 • Published • 42 -
lmms-lab/llava-next-interleave-qwen-7b
Text Generation • 8B • Updated • 855 • 27 -
lmms-lab/llava-next-interleave-qwen-7b-dpo
Text Generation • 8B • Updated • 126 • 12 -
lmms-lab/M4-Instruct-Data
Updated • 1.33k • 76
Making Lite version of the dataset to accelerate holistic evaluation during model development!
-
OpenMMReasoner/OpenMMReasoner-ColdStart
Image-Text-to-Text • 8B • Updated • 643 • 3 -
OpenMMReasoner/OpenMMReasoner-RL
Image-Text-to-Text • 8B • Updated • 682 • 15 -
OpenMMReasoner/OpenMMReasoner-SFT-874K
Viewer • Updated • 874k • 783 • 5 -
OpenMMReasoner/OpenMMReasoner-RL-74K
Viewer • Updated • 74.7k • 947 • 7
as a general evaluator for assessing model performance
a model good at arbitrary types of visual input
Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/
Some powerful image models.
-
OpenMMReasoner/OpenMMReasoner-ColdStart
Image-Text-to-Text • 8B • Updated • 643 • 3 -
OpenMMReasoner/OpenMMReasoner-RL
Image-Text-to-Text • 8B • Updated • 682 • 15 -
OpenMMReasoner/OpenMMReasoner-SFT-874K
Viewer • Updated • 874k • 783 • 5 -
OpenMMReasoner/OpenMMReasoner-RL-74K
Viewer • Updated • 74.7k • 947 • 7
https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5
-
mvp-lab/LLaVA-OneVision-1.5-Instruct-Data
Viewer • Updated • 21.9M • 75.7k • 55 -
mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M
Viewer • Updated • 91.5M • 222k • 48 -
lmms-lab/LLaVA-OneVision-1.5-8B-Instruct
Image-Text-to-Text • 9B • Updated • 13.1k • 58 -
lmms-lab/LLaVA-OneVision-1.5-4B-Instruct
Image-Text-to-Text • 5B • Updated • 3.91k • 15
MMSearch-R1 is a solution designed to train LMMs to perform on-demand multimodal search in real-world environment.
CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/
The collection of the sae that hooked on llava
as a general evaluator for assessing model performance
Models focus on video understanding (previously known as LLaVA-NeXT-Video).
a model good at arbitrary types of visual input
Dataset Collection of LMMs-Eval
Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper • 2407.07895 • Published • 42 -
lmms-lab/llava-next-interleave-qwen-7b
Text Generation • 8B • Updated • 855 • 27 -
lmms-lab/llava-next-interleave-qwen-7b-dpo
Text Generation • 8B • Updated • 126 • 12 -
lmms-lab/M4-Instruct-Data
Updated • 1.33k • 76
Some powerful image models.
Making Lite version of the dataset to accelerate holistic evaluation during model development!