metadata
license: mit
language:
- multilingual
- en
- ru
tags:
- whisper
- gguf
- quantized
- speech-recognition
- rust
- candle
base_model:
- openai/whisper-tiny
pipeline_tag: automatic-speech-recognition
WHISPER-TINY - GGUF Quantized Models
Quantized versions of openai/whisper-tiny in GGUF format.
Directory Structure
tiny/
βββ whisper-tiny-q*.gguf # Candle-compatible GGUF models (root)
βββ model-tiny-q80.gguf # Candle-compatible legacy naming (q8_0 format)
βββ config-tiny.json # Model configuration for Candle
βββ tokenizer-tiny.json # Tokenizer for Candle
βββ whisper.cpp/ # whisper.cpp-compatible models
βββ whisper-tiny-q*.gguf
Format Compatibility
Root directory (
whisper-tiny-*.gguf): Use with Candle (Rust ML framework)- Tensor names include
model.prefix (e.g.,model.encoder.conv1.weight) - Requires
config-tiny.jsonandtokenizer-tiny.json
- Tensor names include
whisper.cpp/ directory: Use with whisper.cpp (C++ implementation)
- Tensor names without
model.prefix (e.g.,encoder.conv1.weight) - Compatible with whisper.cpp CLI tools
- Both directories contain
.gguffiles, not.binfiles
- Tensor names without
Available Formats
| Format | Quality | Use Case |
|---|---|---|
| q2_k | Smallest | Extreme compression |
| q3_k | Small | Mobile devices |
| q4_0 | Good | Legacy compatibility |
| q4_k | Good | Recommended for production |
| q4_1 | Good+ | Legacy with bias |
| q5_0 | Very Good | Legacy compatibility |
| q5_k | Very Good | High quality |
| q5_1 | Very Good+ | Legacy with bias |
| q6_k | Excellent | Near-lossless |
| q8_0 | Excellent | Minimal loss, benchmarking |
Usage
With Candle (Rust)
Command line example:
# Run Candle Whisper with local quantized model
cargo run --example whisper --release -- \
--features symphonia \
--quantized \
--model tiny \
--model-id oxide-lab/whisper-tiny-GGUF
With whisper.cpp (C++)
# Use models from whisper.cpp/ subdirectory
./whisper.cpp/build/bin/whisper-cli \
--model models/openai/tiny/whisper.cpp/whisper-tiny-q4_k.gguf \
--file audio.wav
Recommended Format
For most use cases, we recommend q4_k format as it provides the best balance of:
- Size reduction (~65% smaller)
- Quality (minimal degradation)
- Speed (faster inference than higher quantizations)
Quantization Details
- Source Model: openai/whisper-tiny
- Quantization Methods:
- Candle GGUF (root directory): Python-based quantization. Directly PyTorch β GGUF
- Adds
model.prefix to tensor names for Candle compatibility
- Adds
- whisper.cpp GGML (whisper.cpp/ subdirectory): whisper-quantize tool
- Uses original tensor names without prefix
- Candle GGUF (root directory): Python-based quantization. Directly PyTorch β GGUF
- Format: GGUF (GGML Universal Format) for both directories
- Total Formats: 10 quantization levels (q2_k through q8_0)
License
Same as the original Whisper model (MIT License).
Citation
@misc{radford2022whisper,
doi = {10.48550/ARXIV.2212.04356},
url = {https://arxiv.org/abs/2212.04356},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}