Image-Text-to-Text
Transformers
TensorBoard
Safetensors
feature-extraction
conversational
custom_code
xiangan commited on
Commit
80719a5
·
verified ·
1 Parent(s): 75a1758

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -31,7 +31,6 @@ pipeline_tag: image-text-to-text
31
 
32
  ## Introduction
33
 
34
- Copilot said: LLaVA-OneVision-1.5 is a fully open-source family of
35
  LLaVA-OneVision-1.5 is a fully open-source family of large multimodal models (LMMs) built to democratize multimodal training. Trained on native‑resolution images, it delivers state‑of‑the‑art performance at substantially lower cost. The project also releases high‑quality pretraining and SFT data, a complete and efficient training framework with recipes and configs, and comprehensive logs to support transparent, reproducible research.
36
  #### **Superior Performance**
37
  - The model leads on multiple multimodal benchmarks and generally surpasses Qwen2.5-VL.
@@ -69,8 +68,8 @@ LLaVA-OneVision-1.5 is a fully open-source family of large multimodal models (LM
69
 
70
  | Description | Link | Status |
71
  |--------------------|--------------------------------------------------------------------------------------------------------|-------------|
72
- | OV-1.5-Mid-Training-85M | [🤗HF/85M](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M) | Uploading… |
73
- | OV-1.5-Instruct | [🤗HF/Inst](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Insturct-Data) | Uploading… |
74
 
75
 
76
  ## Evaluation Results
 
31
 
32
  ## Introduction
33
 
 
34
  LLaVA-OneVision-1.5 is a fully open-source family of large multimodal models (LMMs) built to democratize multimodal training. Trained on native‑resolution images, it delivers state‑of‑the‑art performance at substantially lower cost. The project also releases high‑quality pretraining and SFT data, a complete and efficient training framework with recipes and configs, and comprehensive logs to support transparent, reproducible research.
35
  #### **Superior Performance**
36
  - The model leads on multiple multimodal benchmarks and generally surpasses Qwen2.5-VL.
 
68
 
69
  | Description | Link | Status |
70
  |--------------------|--------------------------------------------------------------------------------------------------------|-------------|
71
+ | LLaVA-OneVision-1.5-Mid-Training-85M | [🤗HF / Mid-Training 85M](https://huggingface.co/datasets/mvp-lab/LLaVA-OneVision-1.5-Mid-Training-85M) | Uploading… |
72
+ | LLaVA-OneVision-1.5-Instruct | [🤗HF / Instruct-Data](https://huggingface.co/datasets/mvp-lab/LLaVA-OneVision-1.5-Instruct-Data) | Available |
73
 
74
 
75
  ## Evaluation Results