Update README.md
Browse files
README.md
CHANGED
|
@@ -17,7 +17,7 @@ Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](http
|
|
| 17 |
- **KL Divergence**: Slight improvements were observed during GRPO training by leveraging KL divergence.
|
| 18 |
- **Domain Ratio vs. Data Volume**: Domain diversity outweighs data volume. We utilized only 10k samples, with 5k randomly selected from AVQA and another 5k from MusicBench.
|
| 19 |
|
| 20 |
-
## Performance: Accuracies (%) on MMAU Test-mini and Test benchmark
|
| 21 |
| Model | Method | Sound (Test-mini) | Sound (Test) | Music (Test-mini) | Music (Test) | Speech (Test-mini) | Speech (Test) | Average (Test-mini) | Average (Test) |
|
| 22 |
|---------------------------------------|-----------------------|-----------|-------|-----------|-------|-----------|------|------------|-------|
|
| 23 |
| - | Human\* | 86.31 | - | 78.22 | - | 82.17 | - | 82.23 | - |
|
|
@@ -34,6 +34,14 @@ Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](http
|
|
| 34 |
| Qwen2.5-Omni-7B | \[4\] | 67.87 | - | 69.16 | - | 59.76 | - | 65.60 | - |
|
| 35 |
| Ke-Omni-R(Qwen2.5-Omni-7B) | GRPO(ours) | 69.37 | **71.90** | 69.46 | 67.13 |**67.87** | 67.10 | **68.90** |**68.71** |
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
Note:
|
| 38 |
|
| 39 |
- \* The data are sourced from the [MMAU leaderboard](https://sakshi113.github.io/mmau_homepage/#leaderboard).
|
|
|
|
| 17 |
- **KL Divergence**: Slight improvements were observed during GRPO training by leveraging KL divergence.
|
| 18 |
- **Domain Ratio vs. Data Volume**: Domain diversity outweighs data volume. We utilized only 10k samples, with 5k randomly selected from AVQA and another 5k from MusicBench.
|
| 19 |
|
| 20 |
+
## Performance: Accuracies (%)↑ on MMAU Test-mini and Test benchmark
|
| 21 |
| Model | Method | Sound (Test-mini) | Sound (Test) | Music (Test-mini) | Music (Test) | Speech (Test-mini) | Speech (Test) | Average (Test-mini) | Average (Test) |
|
| 22 |
|---------------------------------------|-----------------------|-----------|-------|-----------|-------|-----------|------|------------|-------|
|
| 23 |
| - | Human\* | 86.31 | - | 78.22 | - | 82.17 | - | 82.23 | - |
|
|
|
|
| 34 |
| Qwen2.5-Omni-7B | \[4\] | 67.87 | - | 69.16 | - | 59.76 | - | 65.60 | - |
|
| 35 |
| Ke-Omni-R(Qwen2.5-Omni-7B) | GRPO(ours) | 69.37 | **71.90** | 69.46 | 67.13 |**67.87** | 67.10 | **68.90** |**68.71** |
|
| 36 |
|
| 37 |
+
## Performance: CER/WER (%)↓ on ASR benchmark
|
| 38 |
+
| Model | Method | WenetSpeech test-net | WenetSpeech test-meeting | LibriSpeech test-clean | LibriSpeech test-other|
|
| 39 |
+
| ---|----| ----| ----| ---- | ----|
|
| 40 |
+
| Qwen2.5-Omni-3B | \[4\] | 6.3 | 8.1 | 2.2 | 4.5 |
|
| 41 |
+
| Qwen2.5-Omni-7B | \[4\] | 5.9 | 7.7 | 1.8 | 3.4 |
|
| 42 |
+
| Ke-Omni-3B | ours | 11.7 | 16.1 | 1.8 | 3.8 |
|
| 43 |
+
| Ke-Omni-7B | ours | 7.5 | 9.8 | **1.6** | **3.1** |
|
| 44 |
+
|
| 45 |
Note:
|
| 46 |
|
| 47 |
- \* The data are sourced from the [MMAU leaderboard](https://sakshi113.github.io/mmau_homepage/#leaderboard).
|