Image-Text-to-Text
Transformers
TensorBoard
Safetensors
feature-extraction
conversational
custom_code

Improve model card: Add paper, project page, and code links

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +13 -10
README.md CHANGED
@@ -1,17 +1,23 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M
5
- - lmms-lab/LLaVA-OneVision-1.5-Insturct-Data
6
  base_model:
7
  - Qwen/Qwen3-8B-Base
8
  - DeepGlint-AI/rice-vit-large-patch14-560
9
- pipeline_tag: image-text-to-text
 
 
10
  library_name: transformers
 
 
11
  ---
 
12
  # LLaVA-OneVision-1.5: Fully Open-Source State-of-the-Art VLM Model
13
 
14
- **LLaVA-OneVision1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance** with substantially **lower cost** through training on **native resolution** images.
 
 
 
 
 
15
 
16
  - **Superior Performance**
17
  A family of fully open-source large multimodal models demonstrating
@@ -59,9 +65,6 @@ Meticulously curated **pre-training and SFT data** with rigorous filtering and q
59
  | OV-1.5-Mid-Training-85M | [🤗HF/85M](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M) | Uploading… |
60
  | OV-1.5-Instruct | [🤗HF/Inst](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Insturct-Data) | Uploading… |
61
 
62
- ## Code
63
- This model is trained using a fully open-source, end-to-end training framework, with all code available at [EvolvingLMMs-Lab/LLaVA-OneVision-1.5](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5).
64
-
65
 
66
  ## Evaluation Results
67
  All evaluations were conducted using [lmms_eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).
@@ -102,7 +105,7 @@ Here we show a code snippet to show you how to use the chat model with `transfor
102
  ```python
103
  from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM
104
  from qwen_vl_utils import process_vision_info
105
- model_path = "lmms-lab/LLaVA-One-Vision-1.5-8B-Instruct"
106
 
107
  # default: Load the model on the available device(s)
108
  model = AutoModelForCausalLM.from_pretrained(
 
1
  ---
 
 
 
 
2
  base_model:
3
  - Qwen/Qwen3-8B-Base
4
  - DeepGlint-AI/rice-vit-large-patch14-560
5
+ datasets:
6
+ - lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M
7
+ - lmms-lab/LLaVA-OneVision-1.5-Insturct-Data
8
  library_name: transformers
9
+ license: apache-2.0
10
+ pipeline_tag: image-text-to-text
11
  ---
12
+
13
  # LLaVA-OneVision-1.5: Fully Open-Source State-of-the-Art VLM Model
14
 
15
+ This repository contains the LLaVA-OneVision-1.5 models, as presented in the paper [LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training](https://huggingface.co/papers/2509.23661).
16
+
17
+ Project Page: [https://huggingface.co/spaces/lmms-lab/LLaVA-OneVision-1.5](https://huggingface.co/spaces/lmms-lab/LLaVA-OneVision-1.5)
18
+ Code: [https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5)
19
+
20
+ **LLaVA-OneVision1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance** with substantially **lower cost** through training on **native resolution** images.
21
 
22
  - **Superior Performance**
23
  A family of fully open-source large multimodal models demonstrating
 
65
  | OV-1.5-Mid-Training-85M | [🤗HF/85M](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M) | Uploading… |
66
  | OV-1.5-Instruct | [🤗HF/Inst](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Insturct-Data) | Uploading… |
67
 
 
 
 
68
 
69
  ## Evaluation Results
70
  All evaluations were conducted using [lmms_eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).
 
105
  ```python
106
  from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM
107
  from qwen_vl_utils import process_vision_info
108
+ model_path = "lmms-lab/LLaVA-OneVision-1.5-8B-Instruct"
109
 
110
  # default: Load the model on the available device(s)
111
  model = AutoModelForCausalLM.from_pretrained(