bupalinyu commited on
Commit
83c5223
·
verified ·
1 Parent(s): ec76f6e

Group ASR benchmark table by model scale

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -58,15 +58,17 @@ The model should be loaded with `trust_remote_code=True`. The official inference
58
 
59
  ## Performance
60
 
61
- The following results are from the `open-audio-opd` evaluation. Lower CER/WER is better.
62
 
63
  | Model | aishell-1 (CER) | Wenet-meeting (CER) | Wenet-net (CER) | Libri-clean (WER) | Libri-other (WER) |
64
  | --- | ---: | ---: | ---: | ---: | ---: |
 
65
  | Ark-Base (0.6B) | 3.48% | 10.22% | 7.74% | 3.75% | 7.17% |
66
  | Ark-Base+OPD (0.6B) | 3.00% | 7.18% | 6.13% | 2.88% | 5.50% |
67
  | **Ark-Base+TD+OPD (0.6B)** | **1.95%** | 5.92% | **5.39%** | **2.45%** | **4.56%** |
68
- | Qwen3-ASR-1.7B | 1.50% | 4.69% | 4.55% | 2.20% | 4.05% |
69
  | Qwen3-ASR-0.6B | 2.07% | **5.57%** | 5.45% | 2.81% | 5.05% |
 
 
70
 
71
  `Ark-Base` is the 0.6B supervised ASR checkpoint trained on 100k hours of ASR audio. `TD` denotes teacher-data adaptation using 2,000 hours of teacher-generated ASR data. `OPD` denotes on-policy distillation with a Qwen-ASR teacher.
72
 
 
58
 
59
  ## Performance
60
 
61
+ The following results are from the `open-audio-opd` evaluation. Lower CER/WER is better. Bold numbers mark the best result within the 0.6B group.
62
 
63
  | Model | aishell-1 (CER) | Wenet-meeting (CER) | Wenet-net (CER) | Libri-clean (WER) | Libri-other (WER) |
64
  | --- | ---: | ---: | ---: | ---: | ---: |
65
+ | *0.6B models* | | | | | |
66
  | Ark-Base (0.6B) | 3.48% | 10.22% | 7.74% | 3.75% | 7.17% |
67
  | Ark-Base+OPD (0.6B) | 3.00% | 7.18% | 6.13% | 2.88% | 5.50% |
68
  | **Ark-Base+TD+OPD (0.6B)** | **1.95%** | 5.92% | **5.39%** | **2.45%** | **4.56%** |
 
69
  | Qwen3-ASR-0.6B | 2.07% | **5.57%** | 5.45% | 2.81% | 5.05% |
70
+ | *Larger reference model* | | | | | |
71
+ | Qwen3-ASR-1.7B | 1.50% | 4.69% | 4.55% | 2.20% | 4.05% |
72
 
73
  `Ark-Base` is the 0.6B supervised ASR checkpoint trained on 100k hours of ASR audio. `TD` denotes teacher-data adaptation using 2,000 hours of teacher-generated ASR data. `OPD` denotes on-policy distillation with a Qwen-ASR teacher.
74