Whisper Tiny (ExecuTorch, XNNPACK, 8da4w)
This folder contains an ExecuTorch .pte export of openai/whisper-tiny for CPU inference via the XNNPACK backend, with post-training quantization enabled.
Contents
model.pte: ExecuTorch program (methods:encoder,text_decoder)whisper_preprocessor.pte: mel-spectrogram preprocessor (feature size 80)tokenizer.json,vocab.json,merges.txt: tokenizer artifactsconfig.json,generation_config.json,preprocessor_config.json,tokenizer_config.json,special_tokens_map.json: metadata files from the upstream Hugging Face repo
Quantization
Export flags:
--qlinear 8da4w: decoder linear layers use 8-bit dynamic activations + 4-bit weights--qlinear_encoder 8da4w: encoder linear layers use 8-bit dynamic activations + 4-bit weights
Other export settings:
- Task:
automatic-speech-recognition - Recipe:
xnnpack
Tooling versions used to generate these artifacts:
- ExecuTorch:
executorch==1.2.0a0+efe4f0c(gitefe4f0cce3) - Optimum ExecuTorch:
optimum-executorch==0.2.0.dev0(git4c62ed7)
Command used:
optimum-cli export executorch \
--model "openai/whisper-tiny" \
--task "automatic-speech-recognition" \
--recipe "xnnpack" \
--qlinear "8da4w" \
--qlinear_encoder "8da4w" \
--output_dir "<output_dir>"
Preprocessor command used:
python -m executorch.extension.audio.mel_spectrogram \
--feature_size 80 \
--stack_output \
--max_audio_len 300 \
--output_file whisper_preprocessor.pte
Run with the ExecuTorch Whisper runner
Build the runner from the ExecuTorch repo root:
make whisper-cpu
Run (expects a 16kHz mono WAV):
cmake-out/examples/models/whisper/whisper_runner \
--model_path model.pte \
--tokenizer_path ./ \
--audio_path output.wav \
--processor_path whisper_preprocessor.pte \
--temperature 0
- Downloads last month
- 30
Model tree for larryliu0820/whisper-tiny-INT8-INT4-ExecuTorch-XNNPACK
Base model
openai/whisper-tiny