Binarybardakshat
/

SWRA

Automatic Speech Recognition

hf-asr-leaderboard

Eval Results (legacy)

Model card Files Files and versions

Binarybardakshat commited on Aug 14, 2024

Commit

80f132f

·

verified ·

1 Parent(s): 0289adf

Update README.md

Files changed (1) hide show

README.md +1 -35

README.md CHANGED Viewed

@@ -49,16 +49,12 @@ model-index:
 # SWRA (SWARA)
-`SWRA (SWARA)` is a Speech to Text Transformer (S2T) model trained by @binarybardakshat for automatic speech recognition (ASR). The S2T model was proposed in [this paper](https://arxiv.org/abs/2010.05171) and released in [this repository](https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text).
 ## Model Description
 SWRA (SWARA) is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.
-## Intended Uses & Limitations
-This model can be used for end-to-end speech recognition (ASR). See the [model hub](https://huggingface.co/models?filter=speech_to_text) to look for other S2T checkpoints.
 ### How to Use
 As this is a standard sequence-to-sequence transformer model, you can use the `generate` method to generate the transcripts by passing the speech features to the model.
@@ -134,33 +130,3 @@ print("WER:", wer.compute(predictions=result["transcription"], references=result
 The S2T-SMALL-LIBRISPEECH-ASR is trained on [LibriSpeech ASR Corpus](https://www.openslr.org/12), a dataset consisting of
 approximately 1000 hours of 16kHz read English speech.
-## Training procedure
-### Preprocessing
-The speech data is pre-processed by extracting Kaldi-compliant 80-channel log mel-filter bank features automatically from
-WAV/FLAC audio files via PyKaldi or torchaudio. Further utterance-level CMVN (cepstral mean and variance normalization)
-is applied to each example.
-The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 10,000.
-### Training
-The model is trained with standard autoregressive cross-entropy loss and using [SpecAugment](https://arxiv.org/abs/1904.08779).
-The encoder receives speech features, and the decoder generates the transcripts autoregressively.
-### BibTeX entry and citation info
-```bibtex
-@inproceedings{wang2020fairseqs2t,
-  title = {fairseq S2T: Fast Speech-to-Text Modeling with fairseq},
-  author = {Changhan Wang and Yun Tang and Xutai Ma and Anne Wu and Dmytro Okhonko and Juan Pino},
-  booktitle = {Proceedings of the 2020 Conference of the Asian Chapter of the Association for Computational Linguistics (AACL): System Demonstrations},
-  year = {2020},
-}
-```

 # SWRA (SWARA)
+`SWRA (SWARA)` is a Speech to Text Transformer (S2T) model trained by @binarybardakshat for automatic speech recognition (ASR).
 ## Model Description
 SWRA (SWARA) is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.
 ### How to Use
 As this is a standard sequence-to-sequence transformer model, you can use the `generate` method to generate the transcripts by passing the speech features to the model.
 The S2T-SMALL-LIBRISPEECH-ASR is trained on [LibriSpeech ASR Corpus](https://www.openslr.org/12), a dataset consisting of
 approximately 1000 hours of 16kHz read English speech.