pklumpp
/

Wav2Vec2_CommonPhone

Automatic Speech Recognition

Phone Recognition

International Phonetic Alphabet

Model card Files Files and versions

Wav2Vec2_CommonPhone / README.md

pklumpp's picture

Update README.md

c8533e7 over 2 years ago

|

2.35 kB

	---
	license: cc0-1.0
	language:
	- en
	- de
	- fr
	- es
	- ru
	- it
	pipeline_tag: automatic-speech-recognition
	tags:
	- Phone Recognition
	- International Phonetic Alphabet
	- CTC
	- multilingual
	---
	# Model Card for Wav2Vec2 Large with Common Phone

	This is a multilingual phone recognition model optimized with the [Common Phone](https://zenodo.org/records/5846137) dataset.
	It was created in the scope of the PhD thesis of [Philipp Klumpp](https://scholar.google.com/citations?user=IWvgno4AAAAJ) to analyze pathological speech signals.

	Find the Source Code to use this model on [GITHUB](https://github.com/PKlumpp/phd_model).

	## Model Details

	Wav2Vec2 model with linear projection to CTC blank token + 101 phone symbols from the International Phonetic Alphabet (IPA).
	The model uses 16 kHz audio to predict the most probable sequence of uttered IPA phones.

	### Model Description

	This model was created to analyze pathological speech signals. It was optimized with Common Phone, a multilingual corpus for robust acoustic modelling. It comprises more than 11.000 speakers which were carefully selected from Mozilla's Common Voice dataset.
	Results in terms of phone error rate (PER) in percent:

	\| Language \| Test PER \|
	\|:---:\|:---:\|
	\| English \| 11.0 \|
	\| French \| 9.9 \|
	\| German \| 9.8 \|
	\| Italian \| 9.1 \|
	\| Russian \| 6.6 \|
	\| Spanish \| 8.8 \|
	\| Average \| 9.2 \|

	- Developed by: [Philipp Klumpp](https://scholar.google.com/citations?user=IWvgno4AAAAJ)
	- Model type: [Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)
	- Languages: Multilingual (English, French, German, Italian, Russian, Spanish)
	- License: [Creative Commons Zero 1.0 (CC0)](https://creativecommons.org/publicdomain/zero/1.0/deed.en)
	- Finetuned from model: [Wav2Vec2 XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
	- Finetuning dataset: [Common Phone](https://zenodo.org/records/5846137) as published in [Common Phone: A Multilingual Dataset for Robust Acoustic Modelling](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.81.pdf)

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: [GitHub](https://github.com/PKlumpp/phd_model)
	- Paper: The final print of the thesis will be linked here.

	## Contact

	[Philipp Klumpp](mailto:philipp-klumpp@live.de)