Update README.md
Browse files
README.md
CHANGED
|
@@ -29,7 +29,8 @@ tags:
|
|
| 29 |
|
| 30 |
MOSS-VL-Base-0408 is the foundation checkpoint of the MOSS-VL series, part of the OpenMOSS ecosystem dedicated to advancing visual understanding.
|
| 31 |
|
| 32 |
-
Built through four stages of multimodal pretraining only, this checkpoint serves as a high-capacity offline multimodal base model. It provides strong general-purpose visual-linguistic representations across image and video inputs, and is intended primarily as the base model for downstream supervised fine-tuning, alignment, and domain adaptation.
|
|
|
|
| 33 |
|
| 34 |
- Stage 1: Vision-language alignment
|
| 35 |
- Stage 2: Large-scale multimodal pretraining
|
|
|
|
| 29 |
|
| 30 |
MOSS-VL-Base-0408 is the foundation checkpoint of the MOSS-VL series, part of the OpenMOSS ecosystem dedicated to advancing visual understanding.
|
| 31 |
|
| 32 |
+
Built through four stages of multimodal pretraining only, this checkpoint serves as a high-capacity offline multimodal base model. It provides strong general-purpose visual-linguistic representations across image and video inputs, and is intended primarily as the base model for downstream supervised fine-tuning, alignment, and domain adaptation.
|
| 33 |
+
Specifically, the pretraining pipeline is structured into the following four progressive stages:
|
| 34 |
|
| 35 |
- Stage 1: Vision-language alignment
|
| 36 |
- Stage 2: Large-scale multimodal pretraining
|