zhifeixie
/

Audio-Reasoner

Model card Files Files and versions

zhifeixie commited on Mar 5, 2025

Commit

77deed4

·

verified ·

1 Parent(s): 0debca8

Update README.md

Files changed (1) hide show

README.md +0 -15

README.md CHANGED Viewed

@@ -4,11 +4,7 @@ license: mit
 # Audio-Reasoner
-<p align="center">
-    <img controls src="https://github.com/xzf-thu/Audio-Reasoner/blob/main/assets/title.png" title="v" width="80%"/>
-</p>
-## Abstract
 We implemented inference scaling on **Audio-Reasoner**, a large audio language model, enabling **deepthink** and **structured chain-of-thought (COT) reasoning** for multimodal understanding and reasoning. To achieve this, we constructed CoTA, a high-quality dataset with **1.2M reasoning-rich samples** using structured COT techniques. Audio-Reasoner achieves state-of-the-art results on **MMAU-mini(+25.42%)** and **AIR-Bench-Chat(+14.57%)** benchmarks.
 <p align="center">
@@ -23,11 +19,6 @@ If you like us, pls give us a star⭐ !
 ## Main Results
-<p align="center">
-    <img src="assets\main_result.png" width="80%"/>
-</p>
 ## News and Updates
@@ -155,12 +146,6 @@ Audio - Reasoner can understand various types of audio, including sound, music,
 **2. Why is transformers installed after 'ms-swift' in the environment configuration?**
 The version of transformers has a significant impact on the performance of the model. We have tested that version `transformers==4.49.1` is one of the suitable versions. Installing ms-swift first may ensure a more stable environment for the subsequent installation of transformers to avoid potential version conflicts that could affect the model's performance.
-## More Cases
-<p align="center">
-    <img src="assets\figure2-samples.png" width="90%"/>
-</p>
 ##  Contact
 If you have any questions, please feel free to contact us via `zhifei001@e.ntu.edu.sg`.

 # Audio-Reasoner
 We implemented inference scaling on **Audio-Reasoner**, a large audio language model, enabling **deepthink** and **structured chain-of-thought (COT) reasoning** for multimodal understanding and reasoning. To achieve this, we constructed CoTA, a high-quality dataset with **1.2M reasoning-rich samples** using structured COT techniques. Audio-Reasoner achieves state-of-the-art results on **MMAU-mini(+25.42%)** and **AIR-Bench-Chat(+14.57%)** benchmarks.
 <p align="center">
 ## Main Results
 ## News and Updates
 **2. Why is transformers installed after 'ms-swift' in the environment configuration?**
 The version of transformers has a significant impact on the performance of the model. We have tested that version `transformers==4.49.1` is one of the suitable versions. Installing ms-swift first may ensure a more stable environment for the subsequent installation of transformers to avoid potential version conflicts that could affect the model's performance.
 ##  Contact
 If you have any questions, please feel free to contact us via `zhifei001@e.ntu.edu.sg`.