nielsr HF Staff commited on
Commit
8cb607c
·
verified ·
1 Parent(s): 9991804

Improve model card with abstract and sample usage

Browse files

This PR enhances the model card by:
- Adding the full abstract of the paper, providing a more comprehensive overview of the LiteASR model and its methodology.
- Including a "Sample Usage" section with a Python code snippet, making it easier for users to quickly get started with the `efficient-speech/lite-whisper-small` model using the `transformers` library. The sample code was directly adapted from the provided GitHub repository's "Quick Start" guide.

These improvements aim to make the model card more informative and accessible for the community.

Files changed (1) hide show
  1. README.md +45 -3
README.md CHANGED
@@ -14,6 +14,49 @@ tags:
14
 
15
  Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. See our [GitHub repository](https://github.com/efeslab/LiteASR) and [paper](https://arxiv.org/abs/2502.20583) for details.
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ## Benchmark Results
18
 
19
  Following is the average word error rate (WER) evaluated on the [ESB datasets](https://huggingface.co/datasets/hf-audio/esb-datasets-test-only-sorted):
@@ -40,19 +83,18 @@ Following is the average word error rate (WER) evaluated on the [ESB datasets](h
40
  | [lite-whisper-medium](https://huggingface.co/efficient-speech/lite-whisper-medium) | 14.50 | 239.99M | 456.64M |
41
  | [lite-whisper-medium-fast](https://huggingface.co/efficient-speech/lite-whisper-medium-fast) | 14.52 | 215.31M | 456.64M |
42
 
43
-
44
  ## Citation
45
 
46
  If you use LiteASR in your research, please cite the following paper:
47
 
48
  ```
49
  @misc{kamahori2025liteasrefficientautomaticspeech,
50
- title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation},
51
  author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
52
  year={2025},
53
  eprint={2502.20583},
54
  archivePrefix={arXiv},
55
  primaryClass={cs.LG},
56
- url={https://arxiv.org/abs/2502.20583},
57
  }
58
  ```
 
14
 
15
  Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. See our [GitHub repository](https://github.com/efeslab/LiteASR) and [paper](https://arxiv.org/abs/2502.20583) for details.
16
 
17
+ ## Abstract
18
+
19
+ Modern automatic speech recognition (ASR) models, such as OpenAI's Whisper, rely on deep encoder-decoder architectures, and their encoders are a critical bottleneck for efficient deployment due to high computational intensity. We introduce LiteASR, a low-rank compression scheme for ASR encoders that significantly reduces inference costs while maintaining transcription accuracy. Our approach leverages the strong low-rank properties observed in intermediate activations: by applying principal component analysis (PCA) with a small calibration dataset, we approximate linear transformations with a chain of low-rank matrix multiplications, and further optimize self-attention to work in reduced dimensionality. Evaluation results show that our method can compress Whisper large-v3's encoder size by over 50%, matching Whisper medium's size with better transcription accuracy, thereby establishing a new Pareto frontier of accuracy and efficiency.
20
+
21
+ ## Sample Usage
22
+
23
+ The easiest way to run our model is to use our integration with HuggingFace Transformers library.
24
+ We provide model weights for the compressed version of OpenAI Whisper series [here](https://huggingface.co/efficient-speech).
25
+
26
+ ```python
27
+ import librosa
28
+ import torch
29
+ from transformers import AutoProcessor, AutoModel
30
+
31
+ device = "cuda:0"
32
+ dtype = torch.float16
33
+
34
+ # load the compressed Whisper model
35
+ model = AutoModel.from_pretrained(
36
+ "efficient-speech/lite-whisper-small",
37
+ trust_remote_code=True,
38
+ )
39
+ model.to(dtype).to(device)
40
+
41
+ # we use the same processor as the original model
42
+ processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
43
+
44
+ # set the path to your audio file
45
+ path = "path/to/audio.wav"
46
+ audio, _ = librosa.load(path, sr=16000)
47
+
48
+ input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
49
+ input_features = input_features.to(dtype).to(device)
50
+
51
+ predicted_ids = model.generate(input_features)
52
+ transcription = processor.batch_decode(
53
+ predicted_ids,
54
+ skip_special_tokens=True
55
+ )[0]
56
+
57
+ print(transcription)
58
+ ```
59
+
60
  ## Benchmark Results
61
 
62
  Following is the average word error rate (WER) evaluated on the [ESB datasets](https://huggingface.co/datasets/hf-audio/esb-datasets-test-only-sorted):
 
83
  | [lite-whisper-medium](https://huggingface.co/efficient-speech/lite-whisper-medium) | 14.50 | 239.99M | 456.64M |
84
  | [lite-whisper-medium-fast](https://huggingface.co/efficient-speech/lite-whisper-medium-fast) | 14.52 | 215.31M | 456.64M |
85
 
 
86
  ## Citation
87
 
88
  If you use LiteASR in your research, please cite the following paper:
89
 
90
  ```
91
  @misc{kamahori2025liteasrefficientautomaticspeech,
92
+ title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation},
93
  author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
94
  year={2025},
95
  eprint={2502.20583},
96
  archivePrefix={arXiv},
97
  primaryClass={cs.LG},
98
+ url={https://arxiv.org/abs/2502.20583},
99
  }
100
  ```