|
|
--- |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Developed by:** Eunjung Yeo |
|
|
- **Model type:** phone recognizer |
|
|
- **Language(s) (SLP):** English |
|
|
- **Finetuned from model:** XLS-R-300m |
|
|
|
|
|
### Direct Use |
|
|
- Phone recognition |
|
|
|
|
|
### Downstream Use [optional] |
|
|
- Analysis of phonetic transcriptions |
|
|
- L2 Pronunciation Assessment (Mispronunciation Detection and Diagnosis) |
|
|
- Mispronunciation Assessment for pathological speech |
|
|
|
|
|
## How to Get Started with the Model |
|
|
from transformers import AutoProcessor, AutoModelForCTC |
|
|
|
|
|
processor = AutoProcessor.from_pretrained("speech31/XLS-R-english-phoneme") |
|
|
model = AutoModelForCTC.from_pretrained("speech31/XLS-R-english-phoneme") |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
This model is fine-tuned on the TIMIT dataset. |
|
|
(Can be downloaded from https://catalog.ldc.upenn.edu/LDC93s1) |
|
|
|
|
|
#### Preprocessing |
|
|
The dataset was preprocessed using Epitran for transliterating text into IPA. |
|
|
|
|
|
|
|
|
|