whisper-tiny-GGUF / README.md
FerrisMind's picture
Update README.md
94468a6 verified
metadata
license: mit
language:
  - multilingual
  - en
  - ru
tags:
  - whisper
  - gguf
  - quantized
  - speech-recognition
  - rust
  - candle
base_model:
  - openai/whisper-tiny
pipeline_tag: automatic-speech-recognition

WHISPER-TINY - GGUF Quantized Models

Quantized versions of openai/whisper-tiny in GGUF format.

Directory Structure

tiny/
β”œβ”€β”€ whisper-tiny-q*.gguf       # Candle-compatible GGUF models (root)
β”œβ”€β”€ model-tiny-q80.gguf        # Candle-compatible legacy naming (q8_0 format)
β”œβ”€β”€ config-tiny.json           # Model configuration for Candle
β”œβ”€β”€ tokenizer-tiny.json        # Tokenizer for Candle
└── whisper.cpp/               # whisper.cpp-compatible models
    └── whisper-tiny-q*.gguf

Format Compatibility

  • Root directory (whisper-tiny-*.gguf): Use with Candle (Rust ML framework)

    • Tensor names include model. prefix (e.g., model.encoder.conv1.weight)
    • Requires config-tiny.json and tokenizer-tiny.json
  • whisper.cpp/ directory: Use with whisper.cpp (C++ implementation)

    • Tensor names without model. prefix (e.g., encoder.conv1.weight)
    • Compatible with whisper.cpp CLI tools
    • Both directories contain .gguf files, not .bin files

Available Formats

Format Quality Use Case
q2_k Smallest Extreme compression
q3_k Small Mobile devices
q4_0 Good Legacy compatibility
q4_k Good Recommended for production
q4_1 Good+ Legacy with bias
q5_0 Very Good Legacy compatibility
q5_k Very Good High quality
q5_1 Very Good+ Legacy with bias
q6_k Excellent Near-lossless
q8_0 Excellent Minimal loss, benchmarking

Usage

With Candle (Rust)

Command line example:

# Run Candle Whisper with local quantized model
cargo run --example whisper --release -- \
  --features symphonia \
  --quantized \
  --model tiny \
  --model-id oxide-lab/whisper-tiny-GGUF 

With whisper.cpp (C++)

# Use models from whisper.cpp/ subdirectory
./whisper.cpp/build/bin/whisper-cli \
  --model models/openai/tiny/whisper.cpp/whisper-tiny-q4_k.gguf \
  --file audio.wav

Recommended Format

For most use cases, we recommend q4_k format as it provides the best balance of:

  • Size reduction (~65% smaller)
  • Quality (minimal degradation)
  • Speed (faster inference than higher quantizations)

Quantization Details

  • Source Model: openai/whisper-tiny
  • Quantization Methods:
    • Candle GGUF (root directory): Python-based quantization. Directly PyTorch β†’ GGUF
      • Adds model. prefix to tensor names for Candle compatibility
    • whisper.cpp GGML (whisper.cpp/ subdirectory): whisper-quantize tool
      • Uses original tensor names without prefix
  • Format: GGUF (GGML Universal Format) for both directories
  • Total Formats: 10 quantization levels (q2_k through q8_0)

License

Same as the original Whisper model (MIT License).

Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}