espnet
/

kosp2e-asr-ko

Automatic Speech Recognition

Model card Files Files and versions

kosp2e-asr-ko / README.md

zheedong's picture

README update

359d778 4 months ago

|

history blame contribute delete

1.37 kB

	---
	license: cc-by-4.0
	language:
	- ko
	pipeline_tag: automatic-speech-recognition
	tags:
	- espnet
	- audio
	- automatic-speech-recognition
	---
	# KoSP2E ASR Recipe

	This is the ESPnet2 recipe for the KoSP2E (Korean Speech Perception and Production Experiment) dataset.

	---

	# Overview

	The KoSP2E dataset is a large-scale Korean speech corpus designed for speech perception and production experiments.
	This recipe provides a full ASR pipeline using ESPnet2 with both Transformer and Conformer architectures.

	---

	# Results

	Environment
	* Date: Mon Nov 10 20:35:20 UTC 2025
	* Python: 3.10.19
	* ESPnet: 202509
	* PyTorch: 2.9.0+cu128
	* Model: Conformer (BPE=2000)
	* Decode: Transformer LM (valid.acc.ave)

	### WER
	\| dataset \| Snt \| Wrd \| Corr \| Sub \| Del \| Ins \| Err \| S.Err \|
	\|--------\|----:\|-----:\|----:\|---:\|---:\|---:\|----:\|-----:\|
	\| test \| 2320 \| 22337 \| 77.1 \| 20.4 \| 2.6 \| 4.4 \| 27.4 \| 76.4 \|

	### CER
	\| dataset \| Snt \| Wrd \| Corr \| Sub \| Del \| Ins \| Err \| S.Err \|
	\|--------\|----:\|-----:\|----:\|---:\|---:\|---:\|----:\|-----:\|
	\| test \| 2320 \| 84267 \| 92.5 \| 5.7 \| 1.8 \| 1.7 \| 9.2 \| 76.4 \|

	### TER
	\| dataset \| Snt \| Wrd \| Corr \| Sub \| Del \| Ins \| Err \| S.Err \|
	\|--------\|----:\|-----:\|----:\|---:\|---:\|---:\|----:\|-----:\|
	\| test \| 2320 \| 65361 \| 89.4 \| 8.6 \| 2.0 \| 2.1 \| 12.7 \| 76.4 \|

	---

	# References
	* KoSP2E paper: https://arxiv.org/abs/2107.02875
	---