WHISPER-SMALL - GGUF Quantized Models

Quantized versions of openai/whisper-small in GGUF format.

Directory Structure

small/
├── whisper-small-q*.gguf       # Candle-compatible GGUF models (root)
├── config.json                # Model configuration for Candle
├── tokenizer.json             # Tokenizer for Candle
└── whisper.cpp/               # whisper.cpp-compatible models
    └── whisper-small-q*.gguf

Format Compatibility

Root directory (whisper-small-*.gguf): Use with Candle (Rust ML framework)
- Tensor names include model. prefix (e.g., model.encoder.conv1.weight)
- Compatible with Neurolang application
- Requires config-small.json and tokenizer-small.json
whisper.cpp/ directory: Use with whisper.cpp (C++ implementation)
- Tensor names without model. prefix (e.g., encoder.conv1.weight)
- Compatible with whisper.cpp CLI tools
- Both directories contain .gguf files, not .bin files

Available Formats

Format	Quality	Use Case
q2_k	Smallest	Extreme compression
q3_k	Small	Mobile devices
q4_0	Good	Legacy compatibility
q4_k	Good	Recommended for production
q4_1	Good+	Legacy with bias
q5_0	Very Good	Legacy compatibility
q5_k	Very Good	High quality
q5_1	Very Good+	Legacy with bias
q6_k	Excellent	Near-lossless
q8_0	Excellent	Minimal loss, benchmarking

Usage

With Candle (Rust)

For this model, you need to modify the example code in candle. To try whisper in candle faster and easier, it's better to use the tiny model → https://huggingface.co/oxide-lab/whisper-tiny-GGUF

Command line example:

# Run Candle Whisper with local quantized model
cargo run --example whisper --release -- \
  --features symphonia \
  --quantized \
  --model small \
  --model-id oxide-lab/whisper-small-GGUF \

With whisper.cpp (C++)

# Use models from whisper.cpp/ subdirectory
./whisper.cpp/build/bin/whisper-cli \
  --model models/openai/small/whisper.cpp/whisper-small-q4_k.gguf \
  --file audio.wav

Recommended Format

For most use cases, we recommend q4_k format as it provides the best balance of:

Size reduction (~65% smaller)
Quality (minimal degradation)
Speed (faster inference than higher quantizations)

Quantization Details

Source Model: openai/whisper-small
Quantization Methods:
- Candle GGUF (root directory): Python-based. Directly PyTorch → GGUF
  - Adds model. prefix to tensor names for Candle compatibility
- whisper.cpp GGML (whisper.cpp/ subdirectory): whisper-quantize tool
  - Uses original tensor names without prefix
Format: GGUF (GGML Universal Format) for both directories
Total Formats: 10 quantization levels (q2_k through q8_0)

License

Same as the original Whisper model (MIT License).

Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Downloads last month: 5,082

GGUF

Model size

0.2B params

Architecture

whisper

Hardware compatibility

2-bit

4-bit

5-bit

6-bit

8-bit

Model tree for oxide-lab/whisper-small-GGUF

Base model

openai/whisper-small

Quantized

(222)

this model

Collection including oxide-lab/whisper-small-GGUF

Whisper

Collection

GGUF version of multilingual Whisper models for whisper.cpp and candle • 6 items • Updated Mar 2 • 1

Paper for oxide-lab/whisper-small-GGUF

Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 55