GetmanY1
/

wav2vec2-large-multitask-finnish-l2

Model card Files Files and versions

GetmanY1 commited on Aug 7, 2023

Commit

00327ea

·

1 Parent(s): 3677fe3

Update README.md

Files changed (1) hide show

README.md +52 -1

README.md CHANGED Viewed

@@ -1,4 +1,55 @@
 ---
 language:
 - fi
----

 ---
 language:
 - fi
+---
+The best multi-task wav2vec2 model for Finnish from __Getman, Y., Al-Ghezi, R., Grósz, T., Kurimo, M. (2023) Multi-task wav2vec2 Serving as a Pronunciation Training System for Children__ that performs ASR and speech pronunciation rating task simultaneously.
+## Usage
+You must first install [aalto-speech/multitask-wav2vec2](https://github.com/aalto-speech/multitask-wav2vec2) to use this model. The model can then be used directly as follows:
+```python
+import torch
+import librosa
+import datasets
+from transformers import Wav2Vec2ForMultiTask, Wav2Vec2Processor
+def map_to_array(batch):
+    speech, _ = librosa.load(batch["file"], sr=16000, mono=True)
+    batch["speech"] = speech
+    return batch
+def map_to_pred_multitask(batch):
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    input_values = processor(batch["speech"], sampling_rate=16000, return_tensors="pt", padding="longest").input_values
+    with torch.no_grad():
+        logits = model(input_values.to(device)).logits
+    predicted_ids_ctc = torch.argmax(logits[1], dim=-1)
+    transcription = processor.batch_decode(predicted_ids_ctc)
+    batch["transcription"] = transcription
+    predicted_ids = torch.argmax(logits[0], dim=-1)
+    batch['predictions'] = predicted_ids
+    return batch
+processor =  Wav2Vec2Processor.from_pretrained(MODEL_PATH)
+model = Wav2Vec2ForMultiTask.from_pretrained(MODEL_PATH)
+test_dataset = test_dataset.map(map_to_array)
+result = test_dataset.map(map_to_pred_multitask)
+```
+## Citation
+If you use our models or training scripts, please cite our article as:
+```bibtex
+@inproceedings{getman23_slate,
+    author={Yaroslav Getman and Ragheb Al-Ghezi and Tamás Grósz and Mikko Kurimo},
+    title={{Multi-task wav2vec2 Serving as a Pronunciation Training System for Children}},
+    year=2023,
+    booktitle={Proc. 9th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2023)},
+    pages={TODO},
+    doi={TODO}
+}
+```