Updated Readme
Browse files
README.md
CHANGED
|
@@ -13,4 +13,48 @@ tags:
|
|
| 13 |
- International Phonetic Alphabet
|
| 14 |
- CTC
|
| 15 |
- multilingual
|
| 16 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
- International Phonetic Alphabet
|
| 14 |
- CTC
|
| 15 |
- multilingual
|
| 16 |
+
---
|
| 17 |
+
# Model Card for Wav2Vec2 Large with Common Phone
|
| 18 |
+
|
| 19 |
+
This is a multilingual phone recognition model optimized with the [Common Phone](https://zenodo.org/records/5846137) dataset.
|
| 20 |
+
It was created in the scope of the PhD thesis of [Philipp Klumpp](https://scholar.google.com/citations?user=IWvgno4AAAAJ) to analyze pathological speech signals.
|
| 21 |
+
|
| 22 |
+
## Model Details
|
| 23 |
+
|
| 24 |
+
Wav2Vec2 model with linear projection to CTC blank token + 101 phone symbols from the International Phonetic Alphabet (IPA).
|
| 25 |
+
The model uses 16 kHz audio to predict the most probable sequence of uttered IPA phones.
|
| 26 |
+
|
| 27 |
+
### Model Description
|
| 28 |
+
|
| 29 |
+
This model was created to analyze pathological speech signals. It was optimized with Common Phone, a multilingual corpus for robust acoustic modelling. It comprises more than 11.000 speakers which were carefully selected from Mozilla's Common Voice dataset.
|
| 30 |
+
Results in terms of phone error rate (PER) in percent:
|
| 31 |
+
|
| 32 |
+
| Language | Test PER |
|
| 33 |
+
|:---:|:---:|
|
| 34 |
+
| English | 11.0 |
|
| 35 |
+
| French | 9.9 |
|
| 36 |
+
| German | 9.8 |
|
| 37 |
+
| Italian | 9.1 |
|
| 38 |
+
| Russian | 6.6 |
|
| 39 |
+
| Spanish | 8.8 |
|
| 40 |
+
| **Average** | **9.2** |
|
| 41 |
+
|
| 42 |
+
- **Developed by:** [Philipp Klumpp](https://scholar.google.com/citations?user=IWvgno4AAAAJ)
|
| 43 |
+
- **Model type:** [Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)
|
| 44 |
+
- **Languages:** Multilingual (English, French, German, Italian, Russian, Spanish)
|
| 45 |
+
- **License:** [Creative Commons Zero 1.0 (CC0)](https://creativecommons.org/publicdomain/zero/1.0/deed.en)
|
| 46 |
+
- **Finetuned from model:** [Wav2Vec2 XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
|
| 47 |
+
- **Finetuning dataset:** [Common Phone](https://zenodo.org/records/5846137) as published in [**Common Phone: A Multilingual Dataset for Robust Acoustic Modelling**](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.81.pdf)
|
| 48 |
+
|
| 49 |
+
### Model Sources [optional]
|
| 50 |
+
|
| 51 |
+
<!-- Provide the basic links for the model. -->
|
| 52 |
+
|
| 53 |
+
- **Repository:** [GitHub](https://github.com/PKlumpp/phd_model)
|
| 54 |
+
- **Paper:** The final print of the thesis will be linked here.
|
| 55 |
+
|
| 56 |
+
## Contact
|
| 57 |
+
|
| 58 |
+
[Philipp Klumpp](mailto:philipp-klumpp@live.de)
|
| 59 |
+
|
| 60 |
+
|