pklumpp
/

Wav2Vec2_CommonPhone

Automatic Speech Recognition

Phone Recognition

International Phonetic Alphabet

Model card Files Files and versions

pklumpp commited on Oct 10, 2025

Commit

205121b

·

verified ·

1 Parent(s): 58edfcf

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -13,6 +13,10 @@ tags:
 - International Phonetic Alphabet
 - CTC
 - multilingual
 ---
 # Model Card for Wav2Vec2 Large with Common Phone
@@ -42,7 +46,7 @@ The model uses 16 kHz audio to predict the most probable sequence of uttered IPA
 ### Model Description
-This model was created to analyze pathological speech signals. It was optimized with Common Phone, a multilingual corpus for robust acoustic modelling. It comprises more than 11.000 speakers which were carefully selected from Mozilla's Common Voice dataset.
 Results in terms of phone error rate (PER) in percent:
 | Language | Test PER |
@@ -60,7 +64,7 @@ Results in terms of phone error rate (PER) in percent:
 - **Languages:** Multilingual (English, French, German, Italian, Russian, Spanish)
 - **License:** [Creative Commons Zero 1.0 (CC0)](https://creativecommons.org/publicdomain/zero/1.0/deed.en)
 - **Finetuned from model:** [Wav2Vec2 XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
-- **Finetuning dataset:** [Common Phone](https://zenodo.org/records/5846137) as published in [**Common Phone: A Multilingual Dataset for Robust Acoustic Modelling**](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.81.pdf)
 ### Model Sources [optional]
@@ -71,6 +75,4 @@ Results in terms of phone error rate (PER) in percent:
 ## Contact
-[Philipp Klumpp](mailto:philipp-klumpp@live.de)

 - International Phonetic Alphabet
 - CTC
 - multilingual
+datasets:
+- pklumpp/CommonPhoneDataset
+base_model:
+- facebook/wav2vec2-large-xlsr-53
 ---
 # Model Card for Wav2Vec2 Large with Common Phone
 ### Model Description
+This model was created to analyze pathological speech signals. It was optimized with [Common Phone](https://huggingface.co/datasets/pklumpp/CommonPhoneDataset), a multilingual corpus for robust acoustic modelling. It comprises more than 11.000 speakers which were carefully selected from Mozilla's Common Voice dataset.
 Results in terms of phone error rate (PER) in percent:
 | Language | Test PER |
 - **Languages:** Multilingual (English, French, German, Italian, Russian, Spanish)
 - **License:** [Creative Commons Zero 1.0 (CC0)](https://creativecommons.org/publicdomain/zero/1.0/deed.en)
 - **Finetuned from model:** [Wav2Vec2 XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
+- **Finetuning dataset:** [Common Phone](https://huggingface.co/datasets/pklumpp/CommonPhoneDataset) as published in [**Common Phone: A Multilingual Dataset for Robust Acoustic Modelling**](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.81.pdf)
 ### Model Sources [optional]
 ## Contact
+[Philipp Klumpp](mailto:philipp-klumpp@live.de)