nielsr HF Staff commited on
Commit
df54a50
·
verified ·
1 Parent(s): a162477

Improve model card: Add abstract and Transformers sample usage

Browse files

This PR improves the model card by:
- Updating the main heading to the official paper title for better discoverability and context.
- Adding the full paper abstract to provide a comprehensive overview of the model's methodology and contributions.
- Including a "Sample Usage" section with a `transformers`-compatible code snippet, directly sourced from the GitHub README. This enables users to easily get started with the model.

Existing metadata, benchmark results, and citation information remain unchanged as they are already accurate and well-formatted.

Files changed (1) hide show
  1. README.md +46 -4
README.md CHANGED
@@ -10,9 +10,51 @@ tags:
10
  - hf-asr-leaderboard
11
  ---
12
 
13
- <!-- Provide a quick summary of what the model is/does. -->
14
 
15
- Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. See our [GitHub repository](https://github.com/efeslab/LiteASR) and [paper](https://arxiv.org/abs/2502.20583) for details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Benchmark Results
18
 
@@ -47,12 +89,12 @@ If you use LiteASR in your research, please cite the following paper:
47
 
48
  ```
49
  @misc{kamahori2025liteasrefficientautomaticspeech,
50
- title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation},
51
  author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
52
  year={2025},
53
  eprint={2502.20583},
54
  archivePrefix={arXiv},
55
  primaryClass={cs.LG},
56
- url={https://arxiv.org/abs/2502.20583},
57
  }
58
  ```
 
10
  - hf-asr-leaderboard
11
  ---
12
 
13
+ # LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
14
 
15
+ LiteASR is a compression scheme for automatic speech recognition (ASR) models that leverages the low-rank properties of activation values. Our method can compress OpenAI's Whisper encoder by up to ~50%. For more details, see our [paper](https://arxiv.org/abs/2502.20583) and [GitHub repository](https://github.com/efeslab/LiteASR).
16
+
17
+ ## Abstract
18
+ Modern automatic speech recognition (ASR) models, such as OpenAI's Whisper, rely on deep encoder-decoder architectures, and their encoders are a critical bottleneck for efficient deployment due to high computational intensity. We introduce LiteASR, a low-rank compression scheme for ASR encoders that significantly reduces inference costs while maintaining transcription accuracy. Our approach leverages the strong low-rank properties observed in intermediate activations: by applying principal component analysis (PCA) with a small calibration dataset, we approximate linear transformations with a chain of low-rank matrix multiplications, and further optimize self-attention to work in reduced dimensionality. Evaluation results show that our method can compress Whisper large-v3’s encoder size by over 50%, matching Whisper medium’s size with better transcription accuracy, thereby establishing a new Pareto frontier of accuracy and efficiency.
19
+
20
+ ## Sample Usage
21
+
22
+ The easiest way to run our model is to use our integration with HuggingFace Transformers library.
23
+ We provide model weights for the compressed version of OpenAI Whisper series [here](https://huggingface.co/efficient-speech).
24
+
25
+ ```python
26
+ import librosa
27
+ import torch
28
+ from transformers import AutoProcessor, AutoModel
29
+
30
+ device = "cuda:0"
31
+ dtype = torch.float16
32
+
33
+ # load the compressed Whisper model
34
+ model = AutoModel.from_pretrained(
35
+ "efficient-speech/lite-whisper-large-v3-turbo",
36
+ trust_remote_code=True,
37
+ )
38
+ model.to(dtype).to(device)
39
+
40
+ # we use the same processor as the original model
41
+ processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
42
+
43
+ # set the path to your audio file
44
+ path = "path/to/audio.wav"
45
+ audio, _ = librosa.load(path, sr=16000)
46
+
47
+ input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
48
+ input_features = input_features.to(dtype).to(device)
49
+
50
+ predicted_ids = model.generate(input_features)
51
+ transcription = processor.batch_decode(
52
+ predicted_ids,
53
+ skip_special_tokens=True
54
+ )[0]
55
+
56
+ print(transcription)
57
+ ```
58
 
59
  ## Benchmark Results
60
 
 
89
 
90
  ```
91
  @misc{kamahori2025liteasrefficientautomaticspeech,
92
+ title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation},
93
  author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
94
  year={2025},
95
  eprint={2502.20583},
96
  archivePrefix={arXiv},
97
  primaryClass={cs.LG},
98
+ url={https://arxiv.org/abs/2502.20583},
99
  }
100
  ```