| | --- |
| | license: cc-by-4.0 |
| | language: |
| | - ko |
| | pipeline_tag: automatic-speech-recognition |
| | tags: |
| | - espnet |
| | - audio |
| | - automatic-speech-recognition |
| | --- |
| | # KoSP2E ASR Recipe |
| |
|
| | This is the ESPnet2 recipe for the **KoSP2E (Korean Speech Perception and Production Experiment)** dataset. |
| |
|
| | --- |
| |
|
| | # Overview |
| |
|
| | The **KoSP2E dataset** is a large-scale Korean speech corpus designed for speech perception and production experiments. |
| | This recipe provides a full ASR pipeline using ESPnet2 with both Transformer and Conformer architectures. |
| |
|
| | --- |
| |
|
| | # Results |
| |
|
| | Environment |
| | * Date: Mon Nov 10 20:35:20 UTC 2025 |
| | * Python: 3.10.19 |
| | * ESPnet: 202509 |
| | * PyTorch: 2.9.0+cu128 |
| | * Model: Conformer (BPE=2000) |
| | * Decode: Transformer LM (valid.acc.ave) |
| |
|
| | ### WER |
| | | dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err | |
| | |--------|----:|-----:|----:|---:|---:|---:|----:|-----:| |
| | | test | 2320 | 22337 | 77.1 | 20.4 | 2.6 | 4.4 | 27.4 | 76.4 | |
| |
|
| | ### CER |
| | | dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err | |
| | |--------|----:|-----:|----:|---:|---:|---:|----:|-----:| |
| | | test | 2320 | 84267 | 92.5 | 5.7 | 1.8 | 1.7 | 9.2 | 76.4 | |
| |
|
| | ### TER |
| | | dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err | |
| | |--------|----:|-----:|----:|---:|---:|---:|----:|-----:| |
| | | test | 2320 | 65361 | 89.4 | 8.6 | 2.0 | 2.1 | 12.7 | 76.4 | |
| |
|
| | --- |
| |
|
| | # References |
| | * KoSP2E paper: https://arxiv.org/abs/2107.02875 |
| | --- |
| |
|