metadata
license: mit
language:
- multilingual
- en
- ru
tags:
- whisper
- gguf
- quantized
- speech-recognition
- rust
- candle
base_model:
- openai/whisper-medium
pipeline_tag: automatic-speech-recognition
WHISPER-MEDIUM - GGUF Quantized Models
Quantized versions of openai/whisper-medium in GGUF format.
Directory Structure
medium/
βββ whisper-medium-q*.gguf # Candle-compatible GGUF models (root)
βββ config.json # Model configuration for Candle
βββ tokenizer.json # Tokenizer for Candle
βββ whisper.cpp/ # whisper.cpp-compatible models
βββ whisper-medium-q*.gguf
Format Compatibility
Root directory (
whisper-medium-*.gguf): Use with Candle (Rust ML framework)- Tensor names include
model.prefix (e.g.,model.encoder.conv1.weight) - Requires
config-medium.jsonandtokenizer-medium.json
- Tensor names include
whisper.cpp/ directory: Use with whisper.cpp (C++ implementation)
- Tensor names without
model.prefix (e.g.,encoder.conv1.weight) - Compatible with whisper.cpp CLI tools
- Both directories contain
.gguffiles, not.binfiles
- Tensor names without
Available Formats
| Format | Quality | Use Case |
|---|---|---|
| q2_k | mediumest | Extreme compression |
| q3_k | medium | Mobile devices |
| q4_0 | Good | Legacy compatibility |
| q4_k | Good | Recommended for production |
| q4_1 | Good+ | Legacy with bias |
| q5_0 | Very Good | Legacy compatibility |
| q5_k | Very Good | High quality |
| q5_1 | Very Good+ | Legacy with bias |
| q6_k | Excellent | Near-lossless |
| q8_0 | Excellent | Minimal loss, benchmarking |
Usage
With Candle (Rust)
For this model, you need to modify the example code in candle. To try whisper in candle faster and easier, it's better to use the tiny model β https://huggingface.co/oxide-lab/whisper-tiny-GGUF
Command line example:
# Run Candle Whisper with local quantized model
cargo run --example whisper --release -- \
--features symphonia \
--quantized \
--model medium \
--model-id oxide-lab/whisper-medium-GGUF \
With whisper.cpp (C++)
# Use models from whisper.cpp/ subdirectory
./whisper.cpp/build/bin/whisper-cli \
--model models/openai/medium/whisper.cpp/whisper-medium-q4_k.gguf \
--file audio.wav
Recommended Format
For most use cases, we recommend q4_k format as it provides the best balance of:
- Size reduction (~65% mediumer)
- Quality (minimal degradation)
- Speed (faster inference than higher quantizations)
Quantization Details
- Source Model: openai/whisper-medium
- Quantization Methods:
- Candle GGUF (root directory): Python-based. Directly PyTorch β GGUF
- Adds
model.prefix to tensor names for Candle compatibility
- Adds
- whisper.cpp GGML (whisper.cpp/ subdirectory): whisper-quantize tool
- Uses original tensor names without prefix
- Candle GGUF (root directory): Python-based. Directly PyTorch β GGUF
- Format: GGUF (GGML Universal Format) for both directories
- Total Formats: 10 quantization levels (q2_k through q8_0)
License
Same as the original Whisper model (MIT License).
Citation
@misc{radford2022whisper,
doi = {10.48550/ARXIV.2212.04356},
url = {https://arxiv.org/abs/2212.04356},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}