openslr/librispeech_asr
Viewer • Updated • 585k • 98.3k • 222
How to use speech-seq2seq/wav2vec2-2-gpt2-medium with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="speech-seq2seq/wav2vec2-2-gpt2-medium") # Load model directly
from transformers import AutoTokenizer, AutoModelForSpeechSeq2Seq
tokenizer = AutoTokenizer.from_pretrained("speech-seq2seq/wav2vec2-2-gpt2-medium")
model = AutoModelForSpeechSeq2Seq.from_pretrained("speech-seq2seq/wav2vec2-2-gpt2-medium")# Load model directly
from transformers import AutoTokenizer, AutoModelForSpeechSeq2Seq
tokenizer = AutoTokenizer.from_pretrained("speech-seq2seq/wav2vec2-2-gpt2-medium")
model = AutoModelForSpeechSeq2Seq.from_pretrained("speech-seq2seq/wav2vec2-2-gpt2-medium")YAML Metadata Error:"model-index[0].name" is not allowed to be empty
This model was trained from scratch on the librispeech_asr dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Wer |
|---|---|---|---|---|
| 4.4032 | 0.28 | 500 | 4.6724 | 1.9406 |
| 4.6417 | 0.56 | 1000 | 4.7143 | 1.8874 |
| 4.5725 | 0.84 | 1500 | 4.6413 | 1.9451 |
| 4.0178 | 1.12 | 2000 | 4.5470 | 1.8861 |
| 3.9084 | 1.4 | 2500 | 4.4360 | 1.8881 |
| 3.9297 | 1.68 | 3000 | 4.2814 | 1.8652 |
| 3.707 | 1.96 | 3500 | 4.1035 | 1.8320 |
| 3.1373 | 2.24 | 4000 | 3.9557 | 1.7762 |
| 3.3152 | 2.52 | 4500 | 3.7737 | 1.7454 |
| 2.9501 | 2.8 | 5000 | 3.5264 | 1.7073 |
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="speech-seq2seq/wav2vec2-2-gpt2-medium")