Update README.md
Browse files
README.md
CHANGED
|
@@ -7,6 +7,8 @@ base_model:
|
|
| 7 |
---
|
| 8 |
|
| 9 |
# Ke-Omni-R: Achieving Advanced Audio Reasoning with a Concise 50-Words Think Process
|
|
|
|
|
|
|
| 10 |
|
| 11 |
Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](https://github.com/QwenLM/Qwen2.5-Omni). With only 10k post-training samples, Ke-Omni-R has achieved state-of-the-art performance on the MMAU *Test-mini* and *Test* benchmarks. Key insights from its development include:
|
| 12 |
|
|
@@ -15,9 +17,6 @@ Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](http
|
|
| 15 |
- **KL Divergence**: Slight improvements were observed during GRPO training by leveraging KL divergence.
|
| 16 |
- **Domain Ratio vs. Data Volume**: Domain diversity outweighs data volume. We utilized only 10k samples, with 5k randomly selected from AVQA and another 5k from MusicBench.
|
| 17 |
|
| 18 |
-
If you wish to train or perform inference with the model, please visit the GitHub repository: [https://github.com/shuaijiang/Ke-Omni-R/](https://github.com/shuaijiang/Ke-Omni-R/).
|
| 19 |
-
If you find this model helpful, please like this model and star our GitHub.
|
| 20 |
-
|
| 21 |
## Performance: Accuracies (%) on MMAU Test-mini and Test benchmark
|
| 22 |
| Model | Method | Sound (Test-mini) | Sound (Test) | Music (Test-mini) | Music (Test) | Speech (Test-mini) | Speech (Test) | Average (Test-mini) | Average (Test) |
|
| 23 |
|---------------------------------------|-----------------------|-----------|-------|-----------|-------|-----------|------|------------|-------|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
# Ke-Omni-R: Achieving Advanced Audio Reasoning with a Concise 50-Words Think Process
|
| 10 |
+
If you wish to train or perform inference with the model, please visit the GitHub repository: [https://github.com/shuaijiang/Ke-Omni-R/](https://github.com/shuaijiang/Ke-Omni-R/).
|
| 11 |
+
If you find this model helpful, please like this model and star our GitHub.
|
| 12 |
|
| 13 |
Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](https://github.com/QwenLM/Qwen2.5-Omni). With only 10k post-training samples, Ke-Omni-R has achieved state-of-the-art performance on the MMAU *Test-mini* and *Test* benchmarks. Key insights from its development include:
|
| 14 |
|
|
|
|
| 17 |
- **KL Divergence**: Slight improvements were observed during GRPO training by leveraging KL divergence.
|
| 18 |
- **Domain Ratio vs. Data Volume**: Domain diversity outweighs data volume. We utilized only 10k samples, with 5k randomly selected from AVQA and another 5k from MusicBench.
|
| 19 |
|
|
|
|
|
|
|
|
|
|
| 20 |
## Performance: Accuracies (%) on MMAU Test-mini and Test benchmark
|
| 21 |
| Model | Method | Sound (Test-mini) | Sound (Test) | Music (Test-mini) | Music (Test) | Speech (Test-mini) | Speech (Test) | Average (Test-mini) | Average (Test) |
|
| 22 |
|---------------------------------------|-----------------------|-----------|-------|-----------|-------|-----------|------|------------|-------|
|