Update README.md
Browse files
README.md
CHANGED
|
@@ -49,16 +49,12 @@ model-index:
|
|
| 49 |
|
| 50 |
# SWRA (SWARA)
|
| 51 |
|
| 52 |
-
`SWRA (SWARA)` is a Speech to Text Transformer (S2T) model trained by @binarybardakshat for automatic speech recognition (ASR).
|
| 53 |
|
| 54 |
## Model Description
|
| 55 |
|
| 56 |
SWRA (SWARA) is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.
|
| 57 |
|
| 58 |
-
## Intended Uses & Limitations
|
| 59 |
-
|
| 60 |
-
This model can be used for end-to-end speech recognition (ASR). See the [model hub](https://huggingface.co/models?filter=speech_to_text) to look for other S2T checkpoints.
|
| 61 |
-
|
| 62 |
### How to Use
|
| 63 |
|
| 64 |
As this is a standard sequence-to-sequence transformer model, you can use the `generate` method to generate the transcripts by passing the speech features to the model.
|
|
@@ -134,33 +130,3 @@ print("WER:", wer.compute(predictions=result["transcription"], references=result
|
|
| 134 |
|
| 135 |
The S2T-SMALL-LIBRISPEECH-ASR is trained on [LibriSpeech ASR Corpus](https://www.openslr.org/12), a dataset consisting of
|
| 136 |
approximately 1000 hours of 16kHz read English speech.
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
## Training procedure
|
| 140 |
-
|
| 141 |
-
### Preprocessing
|
| 142 |
-
|
| 143 |
-
The speech data is pre-processed by extracting Kaldi-compliant 80-channel log mel-filter bank features automatically from
|
| 144 |
-
WAV/FLAC audio files via PyKaldi or torchaudio. Further utterance-level CMVN (cepstral mean and variance normalization)
|
| 145 |
-
is applied to each example.
|
| 146 |
-
|
| 147 |
-
The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 10,000.
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
### Training
|
| 151 |
-
|
| 152 |
-
The model is trained with standard autoregressive cross-entropy loss and using [SpecAugment](https://arxiv.org/abs/1904.08779).
|
| 153 |
-
The encoder receives speech features, and the decoder generates the transcripts autoregressively.
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
### BibTeX entry and citation info
|
| 157 |
-
|
| 158 |
-
```bibtex
|
| 159 |
-
@inproceedings{wang2020fairseqs2t,
|
| 160 |
-
title = {fairseq S2T: Fast Speech-to-Text Modeling with fairseq},
|
| 161 |
-
author = {Changhan Wang and Yun Tang and Xutai Ma and Anne Wu and Dmytro Okhonko and Juan Pino},
|
| 162 |
-
booktitle = {Proceedings of the 2020 Conference of the Asian Chapter of the Association for Computational Linguistics (AACL): System Demonstrations},
|
| 163 |
-
year = {2020},
|
| 164 |
-
}
|
| 165 |
-
|
| 166 |
-
```
|
|
|
|
| 49 |
|
| 50 |
# SWRA (SWARA)
|
| 51 |
|
| 52 |
+
`SWRA (SWARA)` is a Speech to Text Transformer (S2T) model trained by @binarybardakshat for automatic speech recognition (ASR).
|
| 53 |
|
| 54 |
## Model Description
|
| 55 |
|
| 56 |
SWRA (SWARA) is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
### How to Use
|
| 59 |
|
| 60 |
As this is a standard sequence-to-sequence transformer model, you can use the `generate` method to generate the transcripts by passing the speech features to the model.
|
|
|
|
| 130 |
|
| 131 |
The S2T-SMALL-LIBRISPEECH-ASR is trained on [LibriSpeech ASR Corpus](https://www.openslr.org/12), a dataset consisting of
|
| 132 |
approximately 1000 hours of 16kHz read English speech.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|