Update README.md
Browse files
README.md
CHANGED
|
@@ -4,11 +4,7 @@ license: mit
|
|
| 4 |
|
| 5 |
|
| 6 |
# Audio-Reasoner
|
| 7 |
-
<p align="center">
|
| 8 |
-
<img controls src="https://github.com/xzf-thu/Audio-Reasoner/blob/main/assets/title.png" title="v" width="80%"/>
|
| 9 |
-
</p>
|
| 10 |
|
| 11 |
-
## Abstract
|
| 12 |
We implemented inference scaling on **Audio-Reasoner**, a large audio language model, enabling **deepthink** and **structured chain-of-thought (COT) reasoning** for multimodal understanding and reasoning. To achieve this, we constructed CoTA, a high-quality dataset with **1.2M reasoning-rich samples** using structured COT techniques. Audio-Reasoner achieves state-of-the-art results on **MMAU-mini(+25.42%)** and **AIR-Bench-Chat(+14.57%)** benchmarks.
|
| 13 |
|
| 14 |
<p align="center">
|
|
@@ -23,11 +19,6 @@ If you like us, pls give us a star⭐ !
|
|
| 23 |
|
| 24 |
|
| 25 |
## Main Results
|
| 26 |
-
<p align="center">
|
| 27 |
-
<img src="assets\main_result.png" width="80%"/>
|
| 28 |
-
</p>
|
| 29 |
-
|
| 30 |
-
|
| 31 |
|
| 32 |
|
| 33 |
## News and Updates
|
|
@@ -155,12 +146,6 @@ Audio - Reasoner can understand various types of audio, including sound, music,
|
|
| 155 |
**2. Why is transformers installed after 'ms-swift' in the environment configuration?**
|
| 156 |
The version of transformers has a significant impact on the performance of the model. We have tested that version `transformers==4.49.1` is one of the suitable versions. Installing ms-swift first may ensure a more stable environment for the subsequent installation of transformers to avoid potential version conflicts that could affect the model's performance.
|
| 157 |
|
| 158 |
-
## More Cases
|
| 159 |
-
<p align="center">
|
| 160 |
-
<img src="assets\figure2-samples.png" width="90%"/>
|
| 161 |
-
</p>
|
| 162 |
-
|
| 163 |
-
|
| 164 |
## Contact
|
| 165 |
|
| 166 |
If you have any questions, please feel free to contact us via `zhifei001@e.ntu.edu.sg`.
|
|
|
|
| 4 |
|
| 5 |
|
| 6 |
# Audio-Reasoner
|
|
|
|
|
|
|
|
|
|
| 7 |
|
|
|
|
| 8 |
We implemented inference scaling on **Audio-Reasoner**, a large audio language model, enabling **deepthink** and **structured chain-of-thought (COT) reasoning** for multimodal understanding and reasoning. To achieve this, we constructed CoTA, a high-quality dataset with **1.2M reasoning-rich samples** using structured COT techniques. Audio-Reasoner achieves state-of-the-art results on **MMAU-mini(+25.42%)** and **AIR-Bench-Chat(+14.57%)** benchmarks.
|
| 9 |
|
| 10 |
<p align="center">
|
|
|
|
| 19 |
|
| 20 |
|
| 21 |
## Main Results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
|
| 24 |
## News and Updates
|
|
|
|
| 146 |
**2. Why is transformers installed after 'ms-swift' in the environment configuration?**
|
| 147 |
The version of transformers has a significant impact on the performance of the model. We have tested that version `transformers==4.49.1` is one of the suitable versions. Installing ms-swift first may ensure a more stable environment for the subsequent installation of transformers to avoid potential version conflicts that could affect the model's performance.
|
| 148 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
## Contact
|
| 150 |
|
| 151 |
If you have any questions, please feel free to contact us via `zhifei001@e.ntu.edu.sg`.
|