Update README.md

94468a6 verified 27 days ago

3.54 kB

license: mit
language:
  - multilingual
  - en
  - ru
tags:
  - whisper
  - gguf
  - quantized
  - speech-recognition
  - rust
  - candle
base_model:
  - openai/whisper-tiny
pipeline_tag: automatic-speech-recognition

WHISPER-TINY - GGUF Quantized Models

Quantized versions of openai/whisper-tiny in GGUF format.

Directory Structure

tiny/
├── whisper-tiny-q*.gguf       # Candle-compatible GGUF models (root)
├── model-tiny-q80.gguf        # Candle-compatible legacy naming (q8_0 format)
├── config-tiny.json           # Model configuration for Candle
├── tokenizer-tiny.json        # Tokenizer for Candle
└── whisper.cpp/               # whisper.cpp-compatible models
    └── whisper-tiny-q*.gguf

Format Compatibility

Root directory (whisper-tiny-*.gguf): Use with Candle (Rust ML framework)
- Tensor names include model. prefix (e.g., model.encoder.conv1.weight)
- Requires config-tiny.json and tokenizer-tiny.json
whisper.cpp/ directory: Use with whisper.cpp (C++ implementation)
- Tensor names without model. prefix (e.g., encoder.conv1.weight)
- Compatible with whisper.cpp CLI tools
- Both directories contain .gguf files, not .bin files

Available Formats

Format	Quality	Use Case
q2_k	Smallest	Extreme compression
q3_k	Small	Mobile devices
q4_0	Good	Legacy compatibility
q4_k	Good	Recommended for production
q4_1	Good+	Legacy with bias
q5_0	Very Good	Legacy compatibility
q5_k	Very Good	High quality
q5_1	Very Good+	Legacy with bias
q6_k	Excellent	Near-lossless
q8_0	Excellent	Minimal loss, benchmarking

Usage

With Candle (Rust)

Command line example:

# Run Candle Whisper with local quantized model
cargo run --example whisper --release -- \
  --features symphonia \
  --quantized \
  --model tiny \
  --model-id oxide-lab/whisper-tiny-GGUF

With whisper.cpp (C++)

# Use models from whisper.cpp/ subdirectory
./whisper.cpp/build/bin/whisper-cli \
  --model models/openai/tiny/whisper.cpp/whisper-tiny-q4_k.gguf \
  --file audio.wav

Recommended Format

For most use cases, we recommend q4_k format as it provides the best balance of:

Size reduction (~65% smaller)
Quality (minimal degradation)
Speed (faster inference than higher quantizations)

Quantization Details

Source Model: openai/whisper-tiny
Quantization Methods:
- Candle GGUF (root directory): Python-based quantization. Directly PyTorch → GGUF
  - Adds model. prefix to tensor names for Candle compatibility
- whisper.cpp GGML (whisper.cpp/ subdirectory): whisper-quantize tool
  - Uses original tensor names without prefix
Format: GGUF (GGML Universal Format) for both directories
Total Formats: 10 quantization levels (q2_k through q8_0)

License

Same as the original Whisper model (MIT License).

Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}