Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -38,11 +38,6 @@ Built on top of MOSS-VL-Base-0408 through supervised fine-tuning (SFT), this che
|
|
| 38 |
- 🖼️ **Strong General Multimodal Perception** — Robust image understanding, fine-grained object recognition, OCR, and document parsing.
|
| 39 |
- 💬 **Reliable Instruction Following** — Substantially improved alignment with user intent through supervised fine-tuning on diverse multimodal instruction data.
|
| 40 |
|
| 41 |
-
### 📝 Note on Variants
|
| 42 |
-
|
| 43 |
-
> [!IMPORTANT]
|
| 44 |
-
> **This is the offline instruction-tuned checkpoint.** It is not the streaming variant. If you are looking for low-latency, real-time interactive video understanding, please refer to the upcoming **MOSS-VL-RealTime** release.
|
| 45 |
-
|
| 46 |
|
| 47 |
---
|
| 48 |
|
|
|
|
| 38 |
- 🖼️ **Strong General Multimodal Perception** — Robust image understanding, fine-grained object recognition, OCR, and document parsing.
|
| 39 |
- 💬 **Reliable Instruction Following** — Substantially improved alignment with user intent through supervised fine-tuning on diverse multimodal instruction data.
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
---
|
| 43 |
|