lmms-lab
/

LLaVA-OneVision-1.5-8B-stage0

Model card Files Files and versions

Yin-Xie commited on Sep 17

Commit

adf1f5e

·

verified ·

1 Parent(s): f1b90e6

Update README.md

Files changed (1) hide show

README.md +37 -3

README.md CHANGED Viewed

@@ -1,6 +1,40 @@
 ---
 license: apache-2.0
 base_model:
-- DeepGlint-AI/rice-vit-large-patch14-560
-- Qwen/Qwen3-8B-Base
----

 ---
 license: apache-2.0
 base_model:
+  - Qwen/Qwen3-8B-Base
+  - DeepGlint-AI/rice-vit-large-patch14-560
+---
+# LLaVA-OneVision-1.5-8B Initialization Model Card
+## 🚀 Overview
+This model provides an initialization checkpoint for training **LLaVA-OneVision-1.5**, designed to combine strong language and vision capabilities. It integrates a powerful LLM and a state-of-the-art vision encoder, with a flexible adapter to enable efficient multimodal learning.
+## 🏗️ Key Components
+- **Vision Encoder:**
+  Uses the pretrained ViT model from [DeepGlint-AI/rice-vit-large-patch14-560](https://huggingface.co/DeepGlint-AI/rice-vit-large-patch14-560) to extract rich visual features.
+- **Adapter:**
+  A randomly initialized adapter module with 4× token compression capability, enabling efficient fusion of image and text modalities.
+- **Language Model:**
+  Incorporates the pretrained language model [Qwen/Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) for robust text understanding and generation.
+## 📝 Usage
+This initialization checkpoint is intended for downstream training and fine-tuning. For usage and training scripts, please refer to the [EvolvingLMMs-Lab/LLaVA-OneVision-1.5 repository](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5).
+## 📚 References
+- [DeepGlint-AI/rice-vit-large-patch14-560](https://huggingface.co/DeepGlint-AI/rice-vit-large-patch14-560)
+- [Qwen/Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base)
+- [EvolvingLMMs-Lab/LLaVA-OneVision-1.5](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5)
+## ⚖️ License
+Apache 2.0