KE-Team
/

Ke-Omni-R

shuaijiang commited on Apr 23, 2025

Commit

40da49a

verified ·

1 Parent(s): 1b94300

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -7,6 +7,8 @@ base_model:
 ---
 # Ke-Omni-R: Achieving Advanced Audio Reasoning with a Concise 50-Words Think Process
 Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](https://github.com/QwenLM/Qwen2.5-Omni). With only 10k post-training samples, Ke-Omni-R has achieved state-of-the-art performance on the MMAU *Test-mini* and *Test* benchmarks. Key insights from its development include:
@@ -15,9 +17,6 @@ Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](http
 - **KL Divergence**: Slight improvements were observed during GRPO training by leveraging KL divergence.
 - **Domain Ratio vs. Data Volume**: Domain diversity outweighs data volume. We utilized only 10k samples, with 5k randomly selected from AVQA and another 5k from MusicBench.
-If you wish to train or perform inference with the model, please visit the GitHub repository: [https://github.com/shuaijiang/Ke-Omni-R/](https://github.com/shuaijiang/Ke-Omni-R/).
-If you find this model helpful, please like this model and star our GitHub.
 ## Performance: Accuracies (%) on MMAU Test-mini and Test benchmark
 | Model                                 | Method                | Sound (Test-mini) | Sound (Test)  | Music (Test-mini) | Music (Test)  | Speech (Test-mini) | Speech (Test)  | Average (Test-mini) | Average (Test)  |
 |---------------------------------------|-----------------------|-----------|-------|-----------|-------|-----------|------|------------|-------|

 ---
 # Ke-Omni-R: Achieving Advanced Audio Reasoning with a Concise 50-Words Think Process
+If you wish to train or perform inference with the model, please visit the GitHub repository: [https://github.com/shuaijiang/Ke-Omni-R/](https://github.com/shuaijiang/Ke-Omni-R/).
+If you find this model helpful, please like this model and star our GitHub.
 Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](https://github.com/QwenLM/Qwen2.5-Omni). With only 10k post-training samples, Ke-Omni-R has achieved state-of-the-art performance on the MMAU *Test-mini* and *Test* benchmarks. Key insights from its development include:
 - **KL Divergence**: Slight improvements were observed during GRPO training by leveraging KL divergence.
 - **Domain Ratio vs. Data Volume**: Domain diversity outweighs data volume. We utilized only 10k samples, with 5k randomly selected from AVQA and another 5k from MusicBench.
 ## Performance: Accuracies (%) on MMAU Test-mini and Test benchmark
 | Model                                 | Method                | Sound (Test-mini) | Sound (Test)  | Music (Test-mini) | Music (Test)  | Speech (Test-mini) | Speech (Test)  | Average (Test-mini) | Average (Test)  |
 |---------------------------------------|-----------------------|-----------|-------|-----------|-------|-----------|------|------------|-------|