File size: 1,770 Bytes
b4a02e4 02858a9 771bc87 02858a9 e7d8d47 02858a9 c07c560 e7d8d47 c07c560 771bc87 b792065 771bc87 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
---
license: apache-2.0
---
# ManWav
[ManWav: The First Manchu ASR Model](https://arxiv.org/pdf/2406.13502) is the fine-tuned Wav2Vec2-XLSR-53 model with Manchu audio data.
## Data
[Link](https://github.com/seemdog/ManWav) to Manchu audio data
## Citation
```
@inproceedings{seo-etal-2024-manwav,
title = "{M}an{W}av: The First {M}anchu {ASR} Model",
author = "Seo, Jean and
Kang, Minha and
Byun, SungJoo and
Lee, Sangah",
editor = "Serikov, Oleg and
Voloshina, Ekaterina and
Postnikova, Anna and
Muradoglu, Saliha and
Le Ferrand, Eric and
Klyachko, Elena and
Vylomova, Ekaterina and
Shavrina, Tatiana and
Tyers, Francis",
booktitle = "Proceedings of the 3rd Workshop on NLP Applications to Field Linguistics (Field Matters 2024)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.fieldmatters-1.2",
pages = "6--11",
abstract = "This study addresses the widening gap in Automatic Speech Recognition (ASR) research between high resource and extremely low resource languages, with a particular focus on Manchu, a severely endangered language. Manchu exemplifies the challenges faced by marginalized linguistic communities in accessing state-of-the-art technologies. In a pioneering effort, we introduce the first-ever Manchu ASR model ManWav, leveraging Wav2Vec2-XLSR-53. The results of the first Manchu ASR is promising, especially when trained with our augmented data. Wav2Vec2-XLSR-53 fine-tuned with augmented data demonstrates a 0.02 drop in CER and 0.13 drop in WER compared to the same base model fine-tuned with original data.",
}
``` |