Use transformers as the library name

#2
by ariG23498 HF Staff - opened
Files changed (3) hide show
  1. README.md +7 -95
  2. llm/config.json +1 -1
  3. sound_mm_projector/config.json +1 -1
README.md CHANGED
@@ -1,98 +1,10 @@
1
  ---
 
2
  library_name: transformers
3
- license: apache-2.0
4
- tags:
5
- - omni-modal
6
- - multimodal
7
- - vision
8
- - audio
9
- - video
10
- - llm
11
- model-index:
12
- - name: OmniVinci
13
- results:
14
- - task:
15
- type: image-to-text
16
- name: Image Understanding
17
- dataset:
18
- name: MVBench
19
- type: mvbench
20
- metrics:
21
- - name: MVBench Score
22
- type: accuracy
23
- value: 70.6
24
- source:
25
- name: OmniVinci Technical Report
26
- url: https://arxiv.org/abs/2510.15870
27
- - task:
28
- type: video-to-text
29
- name: Video Understanding
30
- dataset:
31
- name: Video-MME
32
- type: video-mme
33
- metrics:
34
- - name: Video-MME (w/o sub)
35
- type: accuracy
36
- value: 68.2
37
- source:
38
- name: OmniVinci Technical Report
39
- url: https://arxiv.org/abs/2510.15870
40
- - task:
41
- type: video-to-text
42
- name: Cross-Modal Understanding
43
- dataset:
44
- name: DailyOmni
45
- type: dailyomni
46
- metrics:
47
- - name: DailyOmni Score
48
- type: accuracy
49
- value: 66.5
50
- source:
51
- name: OmniVinci Technical Report
52
- url: https://arxiv.org/abs/2510.15870
53
- - task:
54
- type: audio-to-text
55
- name: Audio Understanding
56
- dataset:
57
- name: MMAR
58
- type: mmar
59
- metrics:
60
- - name: MMAR Score
61
- type: accuracy
62
- value: 58.4
63
- source:
64
- name: OmniVinci Technical Report
65
- url: https://arxiv.org/abs/2510.15870
66
- - task:
67
- type: audio-to-text
68
- name: Audio-Only Reasoning
69
- dataset:
70
- name: MMAU
71
- type: mmau
72
- metrics:
73
- - name: MMAU Score
74
- type: accuracy
75
- value: 71.6
76
- source:
77
- name: OmniVinci Technical Report
78
- url: https://arxiv.org/abs/2510.15870
79
- - task:
80
- type: video-to-text
81
- name: Multi-Modal Reasoning
82
- dataset:
83
- name: Worldsense
84
- type: worldsense
85
- metrics:
86
- - name: Worldsense Score
87
- type: accuracy
88
- value: 48.2
89
- source:
90
- name: OmniVinci Technical Report
91
- url: https://arxiv.org/abs/2510.15870
92
  ---
93
  # <span style="background: linear-gradient(45deg, #667eea 0%, #764ba2 25%, #f093fb 50%, #f5576c 75%, #4facfe 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; font-weight: bold; font-size: 1.1em;">**OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM**</span> <br />
94
 
95
- [![Paper](https://img.shields.io/badge/ArXiv-Paper-brown)](https://arxiv.org/abs/2510.15870)
96
  [![Code](https://img.shields.io/badge/GitHub-Link-blue)](https://github.com/NVlabs/OmniVinci)
97
  [![Model](https://img.shields.io/badge/HuggingFace-Model-yellow)](https://huggingface.co/nvidia/omnivinci)
98
  [![Website](https://img.shields.io/badge/Web-Page-orange)](https://nvlabs.github.io/OmniVinci)
@@ -191,10 +103,10 @@ The model is released under the [NVIDIA OneWay Noncommercial License](asset/NVID
191
  Please consider to cite our paper and this framework, if they are helpful in your research.
192
 
193
  ```bibtex
194
- @article{ye2025omnivinci,
195
- title={OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM},
196
- author={Ye, Hanrong and Yang, Chao-Han Huck and Goel, Arushi and Huang, Wei and Zhu, Ligeng and Su, Yuanhang and Lin, Sean and Cheng, An-Chieh and Wan, Zhen and Tian, Jinchuan and others},
197
- journal={arXiv preprint arXiv:2510.15870},
198
- year={2025}
199
  }
200
  ```
 
1
  ---
2
+ license: other
3
  library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
  # <span style="background: linear-gradient(45deg, #667eea 0%, #764ba2 25%, #f093fb 50%, #f5576c 75%, #4facfe 100%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; background-clip: text; font-weight: bold; font-size: 1.1em;">**OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM**</span> <br />
6
 
7
+ [![Paper](https://img.shields.io/badge/ArXiv-Paper-brown)](arxiv.org/abs/2510.15870 )
8
  [![Code](https://img.shields.io/badge/GitHub-Link-blue)](https://github.com/NVlabs/OmniVinci)
9
  [![Model](https://img.shields.io/badge/HuggingFace-Model-yellow)](https://huggingface.co/nvidia/omnivinci)
10
  [![Website](https://img.shields.io/badge/Web-Page-orange)](https://nvlabs.github.io/OmniVinci)
 
103
  Please consider to cite our paper and this framework, if they are helpful in your research.
104
 
105
  ```bibtex
106
+ @article{omnivinci2025,
107
+ title={OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM},
108
+ author={Hanrong Ye, Chao-Han Huck Yang, Arushi Goel, Wei Huang, Ligeng Zhu, Yuanhang Su, Sean Lin, An-Chieh Cheng, Zhen Wan, Jinchuan Tian, Yuming Lou, Dong Yang, Zhijian Liu, Yukang Chen, Ambrish Dantrey, Ehsan Jahangiri, Sreyan Ghosh, Daguang Xu, Ehsan Hosseini-Asl, Danial Mohseni Taheri, Vidya Murali, Sifei Liu, Jason Lu, Oluwatobi Olabiyi, Frank Wang, Rafael Valle, Bryan Catanzaro, Andrew Tao, Song Han, Jan Kautz, Hongxu Yin, Pavlo Molchanov},
109
+ journal={arXiv},
110
+ year={2025},
111
  }
112
  ```
llm/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "",
3
  "architectures": [
4
  "Qwen2ForCausalLM"
5
  ],
 
1
  {
2
+ "_name_or_path": "/home/hanrongy/user_path/project/vila/VILA-Internal/../exp_log/nvomni-8b-video-0d1-trope128_omniTwds_ras_audfilter_boost_lr5e6_demoonly_n1_bs128_ga8_mstep-1_j20250923/outputs/model/llm",
3
  "architectures": [
4
  "Qwen2ForCausalLM"
5
  ],
sound_mm_projector/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "",
3
  "architectures": [
4
  "SoundMultimodalProjector"
5
  ],
 
1
  {
2
+ "_name_or_path": "/lustre/fs12/portfolios/llmservice/projects/llmservice_fm_vision/users/hanrongy/project/vila/VILA-Internal/../exp_log/nvomni-8b-video-0d1-trope128_omniT_ras_n16_bs2048_ga8_mstep-1_j20250718/outputs/model/sound_mm_projector",
3
  "architectures": [
4
  "SoundMultimodalProjector"
5
  ],