Automatic Speech Recognition
Transformers
Safetensors
PyTorch
arkasr
text-generation
speech
audio
ark-asr
custom_code
Instructions to use AutoArk-AI/ARK-ASR-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AutoArk-AI/ARK-ASR-0.6B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="AutoArk-AI/ARK-ASR-0.6B", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("AutoArk-AI/ARK-ASR-0.6B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Group ASR benchmark table by model scale
Browse files
README.md
CHANGED
|
@@ -58,15 +58,17 @@ The model should be loaded with `trust_remote_code=True`. The official inference
|
|
| 58 |
|
| 59 |
## Performance
|
| 60 |
|
| 61 |
-
The following results are from the `open-audio-opd` evaluation. Lower CER/WER is better.
|
| 62 |
|
| 63 |
| Model | aishell-1 (CER) | Wenet-meeting (CER) | Wenet-net (CER) | Libri-clean (WER) | Libri-other (WER) |
|
| 64 |
| --- | ---: | ---: | ---: | ---: | ---: |
|
|
|
|
| 65 |
| Ark-Base (0.6B) | 3.48% | 10.22% | 7.74% | 3.75% | 7.17% |
|
| 66 |
| Ark-Base+OPD (0.6B) | 3.00% | 7.18% | 6.13% | 2.88% | 5.50% |
|
| 67 |
| **Ark-Base+TD+OPD (0.6B)** | **1.95%** | 5.92% | **5.39%** | **2.45%** | **4.56%** |
|
| 68 |
-
| Qwen3-ASR-1.7B | 1.50% | 4.69% | 4.55% | 2.20% | 4.05% |
|
| 69 |
| Qwen3-ASR-0.6B | 2.07% | **5.57%** | 5.45% | 2.81% | 5.05% |
|
|
|
|
|
|
|
| 70 |
|
| 71 |
`Ark-Base` is the 0.6B supervised ASR checkpoint trained on 100k hours of ASR audio. `TD` denotes teacher-data adaptation using 2,000 hours of teacher-generated ASR data. `OPD` denotes on-policy distillation with a Qwen-ASR teacher.
|
| 72 |
|
|
|
|
| 58 |
|
| 59 |
## Performance
|
| 60 |
|
| 61 |
+
The following results are from the `open-audio-opd` evaluation. Lower CER/WER is better. Bold numbers mark the best result within the 0.6B group.
|
| 62 |
|
| 63 |
| Model | aishell-1 (CER) | Wenet-meeting (CER) | Wenet-net (CER) | Libri-clean (WER) | Libri-other (WER) |
|
| 64 |
| --- | ---: | ---: | ---: | ---: | ---: |
|
| 65 |
+
| *0.6B models* | | | | | |
|
| 66 |
| Ark-Base (0.6B) | 3.48% | 10.22% | 7.74% | 3.75% | 7.17% |
|
| 67 |
| Ark-Base+OPD (0.6B) | 3.00% | 7.18% | 6.13% | 2.88% | 5.50% |
|
| 68 |
| **Ark-Base+TD+OPD (0.6B)** | **1.95%** | 5.92% | **5.39%** | **2.45%** | **4.56%** |
|
|
|
|
| 69 |
| Qwen3-ASR-0.6B | 2.07% | **5.57%** | 5.45% | 2.81% | 5.05% |
|
| 70 |
+
| *Larger reference model* | | | | | |
|
| 71 |
+
| Qwen3-ASR-1.7B | 1.50% | 4.69% | 4.55% | 2.20% | 4.05% |
|
| 72 |
|
| 73 |
`Ark-Base` is the 0.6B supervised ASR checkpoint trained on 100k hours of ASR audio. `TD` denotes teacher-data adaptation using 2,000 hours of teacher-generated ASR data. `OPD` denotes on-policy distillation with a Qwen-ASR teacher.
|
| 74 |
|