lite-whisper-medium / README.md
nielsr's picture
nielsr HF Staff
Add sample usage from GitHub README
d4d1a0b verified
|
raw
history blame
4.2 kB
metadata
base_model: openai/whisper-medium
library_name: transformers
license: apache-2.0
pipeline_tag: automatic-speech-recognition
tags:
  - audio
  - automatic-speech-recognition
  - whisper
  - hf-asr-leaderboard

Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. See our GitHub repository and paper for details.

Quick Start

The easiest way to run our model is to use our integration with HuggingFace Transformers library. We provide model weights for the compressed version of OpenAI Whisper series here.

import librosa 
import torch
from transformers import AutoProcessor, AutoModel

device = "cuda:0"
dtype = torch.float16

# load the compressed Whisper model
model = AutoModel.from_pretrained(
    "efficient-speech/lite-whisper-large-v3-turbo", 
    trust_remote_code=True, 
)
model.to(dtype).to(device)

# we use the same processor as the original model
processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")

# set the path to your audio file
path = "path/to/audio.wav"
audio, _ = librosa.load(path, sr=16000)

input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
input_features = input_features.to(dtype).to(device)

predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(
    predicted_ids, 
    skip_special_tokens=True
)[0]

print(transcription)

Benchmark Results

Following is the average word error rate (WER) evaluated on the ESB datasets:

Model Average WER (↓) Encoder Size Decoder Size
whisper-tiny 22.01 7.63M 29.55M
lite-whisper-tiny-acc 22.97 7.41M 29.55M
lite-whisper-tiny 23.95 7.00M 29.55M
lite-whisper-tiny-fast 27.09 6.48M 29.55M
       
whisper-base 17.67 19.82M 52.00M
lite-whisper-base-acc 19.07 18.64M 52.00M
lite-whisper-base 19.71 17.44M 52.00M
lite-whisper-base-fast 23.05 16.07M 52.00M
       
whisper-small 15.89 87.00M 153.58M
lite-whisper-small-acc 15.37 76.99M 153.58M
lite-whisper-small 14.96 70.16M 153.58M
lite-whisper-small-fast 14.92 63.11M 153.58M
       
whisper-medium 15.12 305.68M 456.64M
lite-whisper-medium-acc 13.46 269.93M 456.64M
lite-whisper-medium 14.50 239.99M 456.64M
lite-whisper-medium-fast 14.52 215.31M 456.64M

Citation

If you use LiteASR in your research, please cite the following paper:

@misc{kamahori2025liteasrefficientautomaticspeech,
      title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation}, 
      author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
      year={2025},
      eprint={2502.20583},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.20583}, 
}