findcard12138 commited on
Commit
346f4aa
Β·
verified Β·
1 Parent(s): 7fe9178

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +3 -2
  2. assets/3d-rope.png +2 -2
README.md CHANGED
@@ -77,9 +77,10 @@ We conducted a comprehensive evaluation of **MOSS-VL-Instruct-0408** across four
77
  ### 🌟 Key Highlights
78
 
79
  * **πŸš€ Leading Video Intelligence**: MOSS-VL achieves a score of **65.8** in Video Understanding, significantly outperforming Qwen3-VL (+2pts). It shows exceptional temporal consistency and action recognition capabilities across benchmarks like `VideoMME`, `MLVU`, `EgoSchema`, and `VSI-bench` (where it outperforms **Qwen3-VL-8B-Instruct** by **8.3 points**).
80
- * **πŸ‘οΈ Outstanding Multimodal Perception**: With a score of **75.1**, MOSS-VL delivers excellent general image-text understanding, shining in fine-grained object recognition and spatial reasoning on benchmarks like `BLINK` and `MMBench`.
81
- * **🧠 Robust Multimodal Reasoning**: Achieving **64.3**, MOSS-VL demonstrates solid logical inference, staying highly competitive with the latest Qwen series on challenging reasoning suites such as `CVBench` and `VisuLogic`.
82
  * **πŸ“„ Reliable Document Understanding**: While the model is primarily optimized for general perception and video, MOSS-VL still delivers **83.9** on OCR and document analysis, ensuring dependable extraction of text and structured information.
 
83
  <p align="center">
84
  <img src="assets/benchmark_table.png" alt="MOSS-VL Benchmark Table" width="100%"/>
85
  </p>
 
77
  ### 🌟 Key Highlights
78
 
79
  * **πŸš€ Leading Video Intelligence**: MOSS-VL achieves a score of **65.8** in Video Understanding, significantly outperforming Qwen3-VL (+2pts). It shows exceptional temporal consistency and action recognition capabilities across benchmarks like `VideoMME`, `MLVU`, `EgoSchema`, and `VSI-bench` (where it outperforms **Qwen3-VL-8B-Instruct** by **8.3 points**).
80
+ * **πŸ‘οΈ Outstanding Multimodal Perception**: MOSS-VL delivers excellent general image-text understanding, shining in fine-grained object recognition and spatial reasoning on benchmarks like `BLINK` and `MMBench`.
81
+ * **🧠 Robust Multimodal Reasoning**: MOSS-VL demonstrates solid logical inference, staying highly competitive with the latest Qwen series on challenging reasoning suites such as `CVBench` and `VisuLogic`.
82
  * **πŸ“„ Reliable Document Understanding**: While the model is primarily optimized for general perception and video, MOSS-VL still delivers **83.9** on OCR and document analysis, ensuring dependable extraction of text and structured information.
83
+
84
  <p align="center">
85
  <img src="assets/benchmark_table.png" alt="MOSS-VL Benchmark Table" width="100%"/>
86
  </p>
assets/3d-rope.png CHANGED

Git LFS Details

  • SHA256: aa84af011196536d73dbcc255aa267179aa433ea697e2e95334cbd41481d4575
  • Pointer size: 131 Bytes
  • Size of remote file: 208 kB

Git LFS Details

  • SHA256: 9cb0e69618339b317472202019f4d9b549de443ab7c105ace9220ee90d2c54b8
  • Pointer size: 131 Bytes
  • Size of remote file: 154 kB