Upload folder using huggingface_hub
Browse files- README.md +3 -2
- assets/3d-rope.png +2 -2
README.md
CHANGED
|
@@ -77,9 +77,10 @@ We conducted a comprehensive evaluation of **MOSS-VL-Instruct-0408** across four
|
|
| 77 |
### π Key Highlights
|
| 78 |
|
| 79 |
* **π Leading Video Intelligence**: MOSS-VL achieves a score of **65.8** in Video Understanding, significantly outperforming Qwen3-VL (+2pts). It shows exceptional temporal consistency and action recognition capabilities across benchmarks like `VideoMME`, `MLVU`, `EgoSchema`, and `VSI-bench` (where it outperforms **Qwen3-VL-8B-Instruct** by **8.3 points**).
|
| 80 |
-
* **ποΈ Outstanding Multimodal Perception**:
|
| 81 |
-
* **π§ Robust Multimodal Reasoning**:
|
| 82 |
* **π Reliable Document Understanding**: While the model is primarily optimized for general perception and video, MOSS-VL still delivers **83.9** on OCR and document analysis, ensuring dependable extraction of text and structured information.
|
|
|
|
| 83 |
<p align="center">
|
| 84 |
<img src="assets/benchmark_table.png" alt="MOSS-VL Benchmark Table" width="100%"/>
|
| 85 |
</p>
|
|
|
|
| 77 |
### π Key Highlights
|
| 78 |
|
| 79 |
* **π Leading Video Intelligence**: MOSS-VL achieves a score of **65.8** in Video Understanding, significantly outperforming Qwen3-VL (+2pts). It shows exceptional temporal consistency and action recognition capabilities across benchmarks like `VideoMME`, `MLVU`, `EgoSchema`, and `VSI-bench` (where it outperforms **Qwen3-VL-8B-Instruct** by **8.3 points**).
|
| 80 |
+
* **ποΈ Outstanding Multimodal Perception**: MOSS-VL delivers excellent general image-text understanding, shining in fine-grained object recognition and spatial reasoning on benchmarks like `BLINK` and `MMBench`.
|
| 81 |
+
* **π§ Robust Multimodal Reasoning**: MOSS-VL demonstrates solid logical inference, staying highly competitive with the latest Qwen series on challenging reasoning suites such as `CVBench` and `VisuLogic`.
|
| 82 |
* **π Reliable Document Understanding**: While the model is primarily optimized for general perception and video, MOSS-VL still delivers **83.9** on OCR and document analysis, ensuring dependable extraction of text and structured information.
|
| 83 |
+
|
| 84 |
<p align="center">
|
| 85 |
<img src="assets/benchmark_table.png" alt="MOSS-VL Benchmark Table" width="100%"/>
|
| 86 |
</p>
|
assets/3d-rope.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|