Upload folder using huggingface_hub
Browse files- .gitattributes +1 -0
- README.md +2 -2
- assets/MOSS-VL-benchmark.png +2 -2
- assets/radar.png +3 -0
.gitattributes
CHANGED
|
@@ -41,3 +41,4 @@ tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
|
| 41 |
assets/MOSS-VL-Benchmark.png filter=lfs diff=lfs merge=lfs -text
|
| 42 |
assets/MOSS-VL-benchmark.png filter=lfs diff=lfs merge=lfs -text
|
| 43 |
assets/benchmark_table.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 41 |
assets/MOSS-VL-Benchmark.png filter=lfs diff=lfs merge=lfs -text
|
| 42 |
assets/MOSS-VL-benchmark.png filter=lfs diff=lfs merge=lfs -text
|
| 43 |
assets/benchmark_table.png filter=lfs diff=lfs merge=lfs -text
|
| 44 |
+
assets/radar.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -73,7 +73,7 @@ We conducted a comprehensive evaluation of **MOSS-VL-Instruct-0408** across four
|
|
| 73 |
|
| 74 |
* **π Leading Video Intelligence**: MOSS-VL achieves a score of **65.8** in Video Understanding, significantly outperforming Qwen3-VL (+2pts). It shows exceptional temporal consistency and action recognition capabilities across benchmarks like `VideoMME`, `MLVU`, `EgoSchema`, and `VSI-bench` (where it outperforms **Qwen3-VL-8B-Instruct** by **8.3 points**).
|
| 75 |
* **ποΈ Outstanding Multimodal Perception**: MOSS-VL delivers excellent general image-text understanding, shining in fine-grained object recognition and spatial reasoning on benchmarks like `BLINK` and `MMBench`.
|
| 76 |
-
* **π§ Robust Multimodal Reasoning**: MOSS-VL demonstrates solid logical inference, staying highly competitive with the latest Qwen series on challenging reasoning suites such as `
|
| 77 |
* **π Reliable Document Understanding**: While the model is primarily optimized for general perception and video, MOSS-VL still delivers **83.9** on OCR and document analysis, ensuring dependable extraction of text and structured information.
|
| 78 |
|
| 79 |
|
|
@@ -294,7 +294,7 @@ MOSS-VL-Instruct-0408 represents an early milestone in the MOSS-VL roadmap, and
|
|
| 294 |
## π Citation
|
| 295 |
```bibtex
|
| 296 |
@misc{moss_vl_2026,
|
| 297 |
-
title = {MOSS-VL Technical Report},
|
| 298 |
author = {OpenMOSS Team},
|
| 299 |
year = {2026},
|
| 300 |
howpublished = {\url{https://github.com/OpenMOSS/MOSS-VL}},
|
|
|
|
| 73 |
|
| 74 |
* **π Leading Video Intelligence**: MOSS-VL achieves a score of **65.8** in Video Understanding, significantly outperforming Qwen3-VL (+2pts). It shows exceptional temporal consistency and action recognition capabilities across benchmarks like `VideoMME`, `MLVU`, `EgoSchema`, and `VSI-bench` (where it outperforms **Qwen3-VL-8B-Instruct** by **8.3 points**).
|
| 75 |
* **ποΈ Outstanding Multimodal Perception**: MOSS-VL delivers excellent general image-text understanding, shining in fine-grained object recognition and spatial reasoning on benchmarks like `BLINK` and `MMBench`.
|
| 76 |
+
* **π§ Robust Multimodal Reasoning**: MOSS-VL demonstrates solid logical inference, staying highly competitive with the latest Qwen series on challenging reasoning suites such as `VisuLogic`.
|
| 77 |
* **π Reliable Document Understanding**: While the model is primarily optimized for general perception and video, MOSS-VL still delivers **83.9** on OCR and document analysis, ensuring dependable extraction of text and structured information.
|
| 78 |
|
| 79 |
|
|
|
|
| 294 |
## π Citation
|
| 295 |
```bibtex
|
| 296 |
@misc{moss_vl_2026,
|
| 297 |
+
title = {{MOSS-VL Technical Report}},
|
| 298 |
author = {OpenMOSS Team},
|
| 299 |
year = {2026},
|
| 300 |
howpublished = {\url{https://github.com/OpenMOSS/MOSS-VL}},
|
assets/MOSS-VL-benchmark.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
assets/radar.png
ADDED
|
Git LFS Details
|