AutoArk-AI
/

ARK-ASR-0.6B

@@ -58,15 +58,17 @@ The model should be loaded with `trust_remote_code=True`. The official inference
 ## Performance
-The following results are from the `open-audio-opd` evaluation. Lower CER/WER is better.
 | Model | aishell-1 (CER) | Wenet-meeting (CER) | Wenet-net (CER) | Libri-clean (WER) | Libri-other (WER) |
 | --- | ---: | ---: | ---: | ---: | ---: |
 | Ark-Base (0.6B) | 3.48% | 10.22% | 7.74% | 3.75% | 7.17% |
 | Ark-Base+OPD (0.6B) | 3.00% | 7.18% | 6.13% | 2.88% | 5.50% |
 | **Ark-Base+TD+OPD (0.6B)** | **1.95%** | 5.92% | **5.39%** | **2.45%** | **4.56%** |
-| Qwen3-ASR-1.7B | 1.50% | 4.69% | 4.55% | 2.20% | 4.05% |
 | Qwen3-ASR-0.6B | 2.07% | **5.57%** | 5.45% | 2.81% | 5.05% |
 `Ark-Base` is the 0.6B supervised ASR checkpoint trained on 100k hours of ASR audio. `TD` denotes teacher-data adaptation using 2,000 hours of teacher-generated ASR data. `OPD` denotes on-policy distillation with a Qwen-ASR teacher.

 ## Performance
+The following results are from the `open-audio-opd` evaluation. Lower CER/WER is better. Bold numbers mark the best result within the 0.6B group.
 | Model | aishell-1 (CER) | Wenet-meeting (CER) | Wenet-net (CER) | Libri-clean (WER) | Libri-other (WER) |
 | --- | ---: | ---: | ---: | ---: | ---: |
+| *0.6B models* | | | | | |
 | Ark-Base (0.6B) | 3.48% | 10.22% | 7.74% | 3.75% | 7.17% |
 | Ark-Base+OPD (0.6B) | 3.00% | 7.18% | 6.13% | 2.88% | 5.50% |
 | **Ark-Base+TD+OPD (0.6B)** | **1.95%** | 5.92% | **5.39%** | **2.45%** | **4.56%** |
 | Qwen3-ASR-0.6B | 2.07% | **5.57%** | 5.45% | 2.81% | 5.05% |
+| *Larger reference model* | | | | | |
+| Qwen3-ASR-1.7B | 1.50% | 4.69% | 4.55% | 2.20% | 4.05% |
 `Ark-Base` is the 0.6B supervised ASR checkpoint trained on 100k hours of ASR audio. `TD` denotes teacher-data adaptation using 2,000 hours of teacher-generated ASR data. `OPD` denotes on-policy distillation with a Qwen-ASR teacher.