nielsr HF Staff commited on
Commit
b152982
·
verified ·
1 Parent(s): ff2585f

Improve model card with detailed abstract and sample usage

Browse files

This PR enhances the model card by:
- Replacing the brief introductory summary with the more detailed abstract from the paper/GitHub repository, providing users with a comprehensive overview of LiteASR's methodology and benefits.
- Adding a "Sample Usage" section, including a direct code snippet from the official GitHub repository. This will allow users to quickly understand how to run the model using the `transformers` library, improving accessibility and usability on the Hugging Face Hub.

All existing metadata and links (including the arXiv paper link) are retained as per the community guidelines.

Files changed (1) hide show
  1. README.md +43 -5
README.md CHANGED
@@ -10,9 +10,48 @@ tags:
10
  - hf-asr-leaderboard
11
  ---
12
 
13
- <!-- Provide a quick summary of what the model is/does. -->
14
 
15
- Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. See our [GitHub repository](https://github.com/efeslab/LiteASR) and [paper](https://arxiv.org/abs/2502.20583) for details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Benchmark Results
18
 
@@ -40,19 +79,18 @@ Following is the average word error rate (WER) evaluated on the [ESB datasets](h
40
  | [lite-whisper-medium](https://huggingface.co/efficient-speech/lite-whisper-medium) | 14.50 | 239.99M | 456.64M |
41
  | [lite-whisper-medium-fast](https://huggingface.co/efficient-speech/lite-whisper-medium-fast) | 14.52 | 215.31M | 456.64M |
42
 
43
-
44
  ## Citation
45
 
46
  If you use LiteASR in your research, please cite the following paper:
47
 
48
  ```
49
  @misc{kamahori2025liteasrefficientautomaticspeech,
50
- title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation},
51
  author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
52
  year={2025},
53
  eprint={2502.20583},
54
  archivePrefix={arXiv},
55
  primaryClass={cs.LG},
56
- url={https://arxiv.org/abs/2502.20583},
57
  }
58
  ```
 
10
  - hf-asr-leaderboard
11
  ---
12
 
13
+ Modern automatic speech recognition (ASR) models, such as OpenAI's Whisper, rely on deep encoder-decoder architectures, and their encoders are a critical bottleneck for efficient deployment due to high computational intensity. We introduce **LiteASR**, a low-rank compression scheme for ASR encoders that significantly reduces inference costs while maintaining transcription accuracy. Our approach leverages the strong low-rank properties observed in intermediate activations: by applying principal component analysis (PCA) with a small calibration dataset, we approximate linear transformations with a chain of low-rank matrix multiplications, and further optimize self-attention to work in reduced dimensionality. Evaluation results show that our method can compress Whisper large-v3's encoder size by over 50%, matching Whisper medium's size with better transcription accuracy, thereby establishing a new Pareto frontier of accuracy and efficiency.
14
 
15
+ For more technical details, see our [GitHub repository](https://github.com/efeslab/LiteASR) and [paper](https://arxiv.org/abs/2502.20583).
16
+
17
+ ## Sample Usage
18
+
19
+ The easiest way to run our model is to use our integration with HuggingFace Transformers library.
20
+ We provide model weights for the compressed version of OpenAI Whisper series [here](https://huggingface.co/efficient-speech).
21
+
22
+ ```python
23
+ import librosa
24
+ import torch
25
+ from transformers import AutoProcessor, AutoModel
26
+
27
+ device = "cuda:0"
28
+ dtype = torch.float16
29
+
30
+ # load the compressed Whisper model
31
+ model = AutoModel.from_pretrained(
32
+ "efficient-speech/lite-whisper-large-v3-turbo",
33
+ trust_remote_code=True,
34
+ )
35
+ model.to(dtype).to(device)
36
+
37
+ # we use the same processor as the original model
38
+ processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
39
+
40
+ # set the path to your audio file
41
+ path = "path/to/audio.wav"
42
+ audio, _ = librosa.load(path, sr=16000)
43
+
44
+ input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
45
+ input_features = input_features.to(dtype).to(device)
46
+
47
+ predicted_ids = model.generate(input_features)
48
+ transcription = processor.batch_decode(
49
+ predicted_ids,
50
+ skip_special_tokens=True
51
+ )[0]
52
+
53
+ print(transcription)
54
+ ```
55
 
56
  ## Benchmark Results
57
 
 
79
  | [lite-whisper-medium](https://huggingface.co/efficient-speech/lite-whisper-medium) | 14.50 | 239.99M | 456.64M |
80
  | [lite-whisper-medium-fast](https://huggingface.co/efficient-speech/lite-whisper-medium-fast) | 14.52 | 215.31M | 456.64M |
81
 
 
82
  ## Citation
83
 
84
  If you use LiteASR in your research, please cite the following paper:
85
 
86
  ```
87
  @misc{kamahori2025liteasrefficientautomaticspeech,
88
+ title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation},
89
  author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
90
  year={2025},
91
  eprint={2502.20583},
92
  archivePrefix={arXiv},
93
  primaryClass={cs.LG},
94
+ url={https://arxiv.org/abs/2502.20583},
95
  }
96
  ```