OpenMOSS-Team
/

MOSS-VL-Instruct-0408

Video-Text-to-Text

feature-extraction

Video-Understanding

Image-Understanding

vision-language

Model card Files Files and versions

findcard12138 commited on Apr 8

Commit

e43e7da

·

verified ·

1 Parent(s): 588a9e7

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +1 -4

README.md CHANGED Viewed

@@ -81,9 +81,6 @@ We conducted a comprehensive evaluation of **MOSS-VL-Instruct-0408** across four
 *   **🧠 Robust Multimodal Reasoning**: MOSS-VL demonstrates solid logical inference, staying highly competitive with the latest Qwen series on challenging reasoning suites such as `CVBench` and `VisuLogic`.
 *   **📄 Reliable Document Understanding**: While the model is primarily optimized for general perception and video, MOSS-VL still delivers **83.9** on OCR and document analysis, ensuring dependable extraction of text and structured information.
-<p align="center">
-    <img src="assets/benchmark_table.png" alt="MOSS-VL Benchmark Table" width="100%"/>
-</p>
 <p align="center">
     <img src="assets/MOSS-VL-benchmark.png" alt="MOSS-VL Benchmark Results" width="100%"/>
@@ -92,7 +89,7 @@ We conducted a comprehensive evaluation of **MOSS-VL-Instruct-0408** across four
 ## 🚀 Quickstart
 ### 🛠️ Requirements
-Installation commands:
 ```bash
 conda create -n moss_vl python=3.12 pip -y

 *   **🧠 Robust Multimodal Reasoning**: MOSS-VL demonstrates solid logical inference, staying highly competitive with the latest Qwen series on challenging reasoning suites such as `CVBench` and `VisuLogic`.
 *   **📄 Reliable Document Understanding**: While the model is primarily optimized for general perception and video, MOSS-VL still delivers **83.9** on OCR and document analysis, ensuring dependable extraction of text and structured information.
 <p align="center">
     <img src="assets/MOSS-VL-benchmark.png" alt="MOSS-VL Benchmark Results" width="100%"/>
 ## 🚀 Quickstart
 ### 🛠️ Requirements
+Installation:
 ```bash
 conda create -n moss_vl python=3.12 pip -y