WHISPER-MEDIUM - GGUF Quantized Models

Quantized versions of openai/whisper-medium in GGUF format.

Directory Structure

medium/
├── whisper-medium-q*.gguf       # Candle-compatible GGUF models (root)
├── config.json                # Model configuration for Candle
├── tokenizer.json             # Tokenizer for Candle
└── whisper.cpp/               # whisper.cpp-compatible models
    └── whisper-medium-q*.gguf

Format Compatibility

  • Root directory (whisper-medium-*.gguf): Use with Candle (Rust ML framework)

    • Tensor names include model. prefix (e.g., model.encoder.conv1.weight)
    • Requires config-medium.json and tokenizer-medium.json
  • whisper.cpp/ directory: Use with whisper.cpp (C++ implementation)

    • Tensor names without model. prefix (e.g., encoder.conv1.weight)
    • Compatible with whisper.cpp CLI tools
    • Both directories contain .gguf files, not .bin files

Available Formats

Format Quality Use Case
q2_k mediumest Extreme compression
q3_k medium Mobile devices
q4_0 Good Legacy compatibility
q4_k Good Recommended for production
q4_1 Good+ Legacy with bias
q5_0 Very Good Legacy compatibility
q5_k Very Good High quality
q5_1 Very Good+ Legacy with bias
q6_k Excellent Near-lossless
q8_0 Excellent Minimal loss, benchmarking

Usage

With Candle (Rust)

For this model, you need to modify the example code in candle. To try whisper in candle faster and easier, it's better to use the tiny model → https://huggingface.co/oxide-lab/whisper-tiny-GGUF

Command line example:

# Run Candle Whisper with local quantized model
cargo run --example whisper --release -- \
  --features symphonia \
  --quantized \
  --model medium \
  --model-id oxide-lab/whisper-medium-GGUF \

With whisper.cpp (C++)

# Use models from whisper.cpp/ subdirectory
./whisper.cpp/build/bin/whisper-cli \
  --model models/openai/medium/whisper.cpp/whisper-medium-q4_k.gguf \
  --file audio.wav

Recommended Format

For most use cases, we recommend q4_k format as it provides the best balance of:

  • Size reduction (~65% mediumer)
  • Quality (minimal degradation)
  • Speed (faster inference than higher quantizations)

Quantization Details

  • Source Model: openai/whisper-medium
  • Quantization Methods:
    • Candle GGUF (root directory): Python-based. Directly PyTorch → GGUF
      • Adds model. prefix to tensor names for Candle compatibility
    • whisper.cpp GGML (whisper.cpp/ subdirectory): whisper-quantize tool
      • Uses original tensor names without prefix
  • Format: GGUF (GGML Universal Format) for both directories
  • Total Formats: 10 quantization levels (q2_k through q8_0)

License

Same as the original Whisper model (MIT License).

Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
Downloads last month
284
GGUF
Model size
0.8B params
Architecture
whisper
Hardware compatibility
Log In to add your hardware

2-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oxide-lab/whisper-medium-GGUF

Quantized
(22)
this model

Collection including oxide-lab/whisper-medium-GGUF

Paper for oxide-lab/whisper-medium-GGUF