Tinker250 commited on
Commit
90b8908
·
verified ·
1 Parent(s): b89c01d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -1
README.md CHANGED
@@ -6,4 +6,85 @@ language:
6
  base_model:
7
  - Qwen/Qwen2.5-VL-7B-Instruct
8
  - Qwen/Qwen2-Audio-7B-Instruct
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  base_model:
7
  - Qwen/Qwen2.5-VL-7B-Instruct
8
  - Qwen/Qwen2-Audio-7B-Instruct
9
+ ---
10
+
11
+ <p align="center">
12
+ <h1 align="center">
13
+ <img src="static/logo.png" alt="Nexus-o" height="40" style="position:relative; top:6px;">
14
+ NEXUS-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision</h1>
15
+ <p align="center">
16
+ <strong>Che Liu</strong>
17
+ ,
18
+ <strong>Yingji Zhang</strong>
19
+ ,
20
+ <strong>Dong Zhang</strong>
21
+ ,
22
+ <strong>Weijie Zhang</strong>
23
+ ,
24
+ <strong>Chenggong Gong</strong>
25
+ ,
26
+ <strong>Yu Lu</strong>
27
+ ,
28
+ <strong>Shilin Zhou</strong>
29
+ ,
30
+ <strong>Ziliang Gan</strong>
31
+ ,
32
+ <br>
33
+ <strong>Ziao Wang</strong>,
34
+ <strong>Haipang Wu</strong>,
35
+ <strong>Ji Liu</strong>,
36
+ <strong>Andre Freitas</strong>,
37
+ <strong>Qifan Wang</strong>,
38
+ <strong>Zenglin Xu</strong>,
39
+ <br>
40
+ <strong>Rongjunchen Zhang</strong><sup>♠</sup>,
41
+ <strong>Yong Dai</strong><sup>♠</sup>
42
+ </p>
43
+ <div class="is-size-5 publication-authors" align="center">
44
+ <span class="author-block">
45
+ <sup>♠</sup>Corresponding author, daiyongya@outlook.com, zhangrongjunchen@myhexin.com
46
+ </span>
47
+ </div>
48
+ <br>
49
+ 📖<a href="https://arxiv.org/pdf/2503.01879">Paper</a> |🤗<a href="https://github.com/HiThink-Research/NEXUS-O">Comming Soon</a></h3>
50
+
51
+ <div align="center"></div>
52
+ <p align="center">
53
+ <p>
54
+ NEXUS-O is an industry-scale omni-modal large language model (LLM) that unifies audio, vision, and language understanding into a single modular framework.
55
+ Human perception integrates sight, sound, and language — NEXUS-O aims to replicate this ability for intelligent agents across real-world scenarios such as ASR, Speech-to-Speech Chat, and Multimodal Reasoning.
56
+ </p>
57
+ <img src="static/omni.png">
58
+ <p>Architecture of NEXUS-O</p>
59
+
60
+ <img src="static/train_stage.png">
61
+ <p>Training Stages</p>
62
+
63
+ ## 📢 News
64
+ - 🚀 [08/01/2025] Our paper has been accepted for ACM MM 2025.
65
+
66
+ ## 💡 Highlights
67
+ - 🧩 Modular End-to-End Framework. A highly configurable encoder–LLM–decoder architecture supporting flexible modality combinations and rapid iteration for industry applications.
68
+ - 💡 Lightweight Alignment Strategy. Efficient audio–language pre-training built upon the state-of-the-art Qwen2.5-VL model — eliminating the need for costly vision pre-training while retaining strong tri-modal performance.
69
+ - 🎧 Synthetic Audio Data Pipeline. A scalable audio synthesis system that generates diverse, high-fidelity audio-text pairs from real-world scenes, enabling robust downstream ASR and S2S tasks.
70
+
71
+ ## TODO
72
+ * [x] Rlease NEXUS-O full model weight on HuggingFace
73
+ * [ ] Rlease Audio Encoder Training Data
74
+ * [ ] Rlease Audio Decoder Training Data
75
+
76
+ ## ✒️Citation
77
+ ```
78
+ @article{liu2025nexus,
79
+ title={Nexus: An Omni-Perceptive And-Interactive Model for Language, Audio, And Vision},
80
+ author={Liu, Che and Zhang, Yingji and Zhang, Dong and Zhang, Weijie and Gong, Chenggong and Li, Haohan and Lu, Yu and Zhou, Shilin and Lu, Yue and Gan, Ziliang and others},
81
+ journal={arXiv preprint arXiv:2503.01879},
82
+ year={2025}
83
+ }
84
+ ```
85
+
86
+ ## 📄 License
87
+ ![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg) ![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg) **Usage and License Notices**: The data and code are intended and licensed for research use only.
88
+ License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use
89
+
90
+ ## 💖 Acknowledgement