findcard12138 commited on
Commit
8c41997
Β·
verified Β·
1 Parent(s): 0931942

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -41,3 +41,4 @@ tokenizer.json filter=lfs diff=lfs merge=lfs -text
41
  assets/MOSS-VL-Benchmark.png filter=lfs diff=lfs merge=lfs -text
42
  assets/MOSS-VL-benchmark.png filter=lfs diff=lfs merge=lfs -text
43
  assets/benchmark_table.png filter=lfs diff=lfs merge=lfs -text
 
 
41
  assets/MOSS-VL-Benchmark.png filter=lfs diff=lfs merge=lfs -text
42
  assets/MOSS-VL-benchmark.png filter=lfs diff=lfs merge=lfs -text
43
  assets/benchmark_table.png filter=lfs diff=lfs merge=lfs -text
44
+ assets/radar.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -73,7 +73,7 @@ We conducted a comprehensive evaluation of **MOSS-VL-Instruct-0408** across four
73
 
74
  * **πŸš€ Leading Video Intelligence**: MOSS-VL achieves a score of **65.8** in Video Understanding, significantly outperforming Qwen3-VL (+2pts). It shows exceptional temporal consistency and action recognition capabilities across benchmarks like `VideoMME`, `MLVU`, `EgoSchema`, and `VSI-bench` (where it outperforms **Qwen3-VL-8B-Instruct** by **8.3 points**).
75
  * **πŸ‘οΈ Outstanding Multimodal Perception**: MOSS-VL delivers excellent general image-text understanding, shining in fine-grained object recognition and spatial reasoning on benchmarks like `BLINK` and `MMBench`.
76
- * **🧠 Robust Multimodal Reasoning**: MOSS-VL demonstrates solid logical inference, staying highly competitive with the latest Qwen series on challenging reasoning suites such as `CVBench` and `VisuLogic`.
77
  * **πŸ“„ Reliable Document Understanding**: While the model is primarily optimized for general perception and video, MOSS-VL still delivers **83.9** on OCR and document analysis, ensuring dependable extraction of text and structured information.
78
 
79
 
@@ -294,7 +294,7 @@ MOSS-VL-Instruct-0408 represents an early milestone in the MOSS-VL roadmap, and
294
  ## πŸ“œ Citation
295
  ```bibtex
296
  @misc{moss_vl_2026,
297
- title = {MOSS-VL Technical Report},
298
  author = {OpenMOSS Team},
299
  year = {2026},
300
  howpublished = {\url{https://github.com/OpenMOSS/MOSS-VL}},
 
73
 
74
  * **πŸš€ Leading Video Intelligence**: MOSS-VL achieves a score of **65.8** in Video Understanding, significantly outperforming Qwen3-VL (+2pts). It shows exceptional temporal consistency and action recognition capabilities across benchmarks like `VideoMME`, `MLVU`, `EgoSchema`, and `VSI-bench` (where it outperforms **Qwen3-VL-8B-Instruct** by **8.3 points**).
75
  * **πŸ‘οΈ Outstanding Multimodal Perception**: MOSS-VL delivers excellent general image-text understanding, shining in fine-grained object recognition and spatial reasoning on benchmarks like `BLINK` and `MMBench`.
76
+ * **🧠 Robust Multimodal Reasoning**: MOSS-VL demonstrates solid logical inference, staying highly competitive with the latest Qwen series on challenging reasoning suites such as `VisuLogic`.
77
  * **πŸ“„ Reliable Document Understanding**: While the model is primarily optimized for general perception and video, MOSS-VL still delivers **83.9** on OCR and document analysis, ensuring dependable extraction of text and structured information.
78
 
79
 
 
294
  ## πŸ“œ Citation
295
  ```bibtex
296
  @misc{moss_vl_2026,
297
+ title = {{MOSS-VL Technical Report}},
298
  author = {OpenMOSS Team},
299
  year = {2026},
300
  howpublished = {\url{https://github.com/OpenMOSS/MOSS-VL}},
assets/MOSS-VL-benchmark.png CHANGED

Git LFS Details

  • SHA256: 104a32833ad454cbf7ce91f1ea49a6e73d15a7c14609f05ad76c1567b978e001
  • Pointer size: 131 Bytes
  • Size of remote file: 902 kB

Git LFS Details

  • SHA256: 9031684791315d61bb4c35c6a3028e1de4043f951167a9a00251d69a321a3aa9
  • Pointer size: 131 Bytes
  • Size of remote file: 961 kB
assets/radar.png ADDED

Git LFS Details

  • SHA256: b1a795b298799db6c97a10068181882a9e99df03b97c63e6229975556938c978
  • Pointer size: 132 Bytes
  • Size of remote file: 5.65 MB