Improve model card with detailed description, sample usage, citation, and acknowledgements

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -12,9 +12,47 @@ tags:
12
 
13
  # Model Card for Lite-Whisper large-v3-fast
14
 
15
- <!-- Provide a quick summary of what the model is/does. -->
16
 
17
- Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. See our [GitHub repository](https://github.com/efeslab/LiteASR) and [paper](https://arxiv.org/abs/2502.20583) for details. The paper is also available on Hugging Face: [Link to Hugging Face Paper Page](https://hf.co/papers/2502.20583)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Benchmark Results
20
 
@@ -32,4 +70,26 @@ Following is the average word error rate (WER) evaluated on the [ESB datasets](h
32
  | [lite-whisper-large-v3-turbo](https://huggingface.co/efficient-speech/lite-whisper-large-v3-turbo) | 12.6 | 374M | 172M |
33
  | [lite-whisper-large-v3-turbo-fast](https://huggingface.co/efficient-speech/lite-whisper-large-v3-turbo-fast) | 20.1 | 313M | 172M |
34
  | &nbsp; | &nbsp; | &nbsp; | &nbsp; |
35
- | [whisper-medium](https://huggingface.co/openai/whisper-medium) | 14.8 | 306M | 457M |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  # Model Card for Lite-Whisper large-v3-fast
14
 
15
+ LiteASR introduces Lite-Whisper, a low-rank compression scheme for ASR encoders, significantly reducing inference costs while maintaining transcription accuracy. This approach leverages strong low-rank properties observed in intermediate activations of deep encoder-decoder architectures, particularly OpenAI's Whisper. By applying principal component analysis (PCA) with a small calibration dataset, LiteASR approximates linear transformations with a chain of low-rank matrix multiplications, and further optimizes self-attention to work in reduced dimensionality. Evaluation results show that our method can compress Whisper large-v3's encoder size by over 50%, matching Whisper medium's size with better transcription accuracy, thereby establishing a new Pareto frontier of accuracy and efficiency.
16
 
17
+ Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. For technical details and full methodology, refer to our [paper](https://arxiv.org/abs/2502.20583) (also available on Hugging Face: [Link to Hugging Face Paper Page](https://hf.co/papers/2502.20583)) and the [LiteASR GitHub repository](https://github.com/efeslab/LiteASR).
18
+
19
+ ## Quick Start
20
+ The easiest way to run our model is to use our integration with HuggingFace Transformers library.
21
+ We provide model weights for the compressed version of OpenAI Whisper series [here](https://huggingface.co/efficient-speech).
22
+
23
+ ```python
24
+ import librosa
25
+ import torch
26
+ from transformers import AutoProcessor, AutoModel
27
+
28
+ device = "cuda:0"
29
+ dtype = torch.float16
30
+
31
+ # load the compressed Whisper model
32
+ model = AutoModel.from_pretrained(
33
+ "efficient-speech/lite-whisper-large-v3-turbo",
34
+ trust_remote_code=True,
35
+ )
36
+ model.to(dtype).to(device)
37
+
38
+ # we use the same processor as the original model
39
+ processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
40
+
41
+ # set the path to your audio file
42
+ path = "path/to/audio.wav"
43
+ audio, _ = librosa.load(path, sr=16000)
44
+
45
+ input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
46
+ input_features = input_features.to(dtype).to(device)
47
+
48
+ predicted_ids = model.generate(input_features)
49
+ transcription = processor.batch_decode(
50
+ predicted_ids,
51
+ skip_special_tokens=True
52
+ )[0]
53
+
54
+ print(transcription)
55
+ ```
56
 
57
  ## Benchmark Results
58
 
 
70
  | [lite-whisper-large-v3-turbo](https://huggingface.co/efficient-speech/lite-whisper-large-v3-turbo) | 12.6 | 374M | 172M |
71
  | [lite-whisper-large-v3-turbo-fast](https://huggingface.co/efficient-speech/lite-whisper-large-v3-turbo-fast) | 20.1 | 313M | 172M |
72
  | &nbsp; | &nbsp; | &nbsp; | &nbsp; |
73
+ | [whisper-medium](https://huggingface.co/openai/whisper-medium) | 14.8 | 306M | 457M |
74
+
75
+ ## Acknowledgement
76
+
77
+ - [OpenAI Whisper](https://github.com/openai/whisper)
78
+ - [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper)
79
+ - [ASR Leaderboard](https://github.com/huggingface/open_asr_leaderboard)
80
+
81
+ ## Citation
82
+
83
+ If you use LiteASR in your research, please cite the following paper:
84
+
85
+ ```
86
+ @misc{kamahori2025liteasrefficientautomaticspeech,
87
+ title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation},
88
+ author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
89
+ year={2025},
90
+ eprint={2502.20583},
91
+ archivePrefix={arXiv},
92
+ primaryClass={cs.LG},
93
+ url={https://arxiv.org/abs/2502.20583},
94
+ }
95
+ ```