File size: 1,368 Bytes
ea4ae0f
 
 
 
 
 
 
 
 
359d778
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
license: cc-by-4.0
language:
- ko
pipeline_tag: automatic-speech-recognition
tags:
- espnet
- audio
- automatic-speech-recognition
---
# KoSP2E ASR Recipe

This is the ESPnet2 recipe for the **KoSP2E (Korean Speech Perception and Production Experiment)** dataset.

---

# Overview

The **KoSP2E dataset** is a large-scale Korean speech corpus designed for speech perception and production experiments.
This recipe provides a full ASR pipeline using ESPnet2 with both Transformer and Conformer architectures.

---

# Results

Environment
* Date: Mon Nov 10 20:35:20 UTC 2025
* Python: 3.10.19
* ESPnet: 202509
* PyTorch: 2.9.0+cu128
* Model: Conformer (BPE=2000)
* Decode: Transformer LM (valid.acc.ave)

### WER
| dataset | Snt | Wrd  | Corr | Sub | Del | Ins | Err | S.Err |
|--------|----:|-----:|----:|---:|---:|---:|----:|-----:|
| test   | 2320 | 22337 | 77.1 | 20.4 | 2.6 | 4.4 | 27.4 | 76.4 |

### CER
| dataset | Snt | Wrd  | Corr | Sub | Del | Ins | Err | S.Err |
|--------|----:|-----:|----:|---:|---:|---:|----:|-----:|
| test   | 2320 | 84267 | 92.5 | 5.7 | 1.8 | 1.7 | 9.2  | 76.4 |

### TER
| dataset | Snt | Wrd  | Corr | Sub | Del | Ins | Err | S.Err |
|--------|----:|-----:|----:|---:|---:|---:|----:|-----:|
| test   | 2320 | 65361 | 89.4 | 8.6 | 2.0 | 2.1 | 12.7 | 76.4 |

---

# References
* KoSP2E paper: https://arxiv.org/abs/2107.02875
---