Update README.md

e540fe8 verified 3 months ago

3.68 kB

license: mit
language:
  - multilingual
  - en
  - ru
tags:
  - whisper
  - gguf
  - quantized
  - speech-recognition
  - rust
  - candle
base_model:
  - openai/whisper-medium
pipeline_tag: automatic-speech-recognition

WHISPER-MEDIUM - GGUF Quantized Models

Quantized versions of openai/whisper-medium in GGUF format.

Directory Structure

medium/
├── whisper-medium-q*.gguf       # Candle-compatible GGUF models (root)
├── config.json                # Model configuration for Candle
├── tokenizer.json             # Tokenizer for Candle
└── whisper.cpp/               # whisper.cpp-compatible models
    └── whisper-medium-q*.gguf

Format Compatibility

Root directory (whisper-medium-*.gguf): Use with Candle (Rust ML framework)
- Tensor names include model. prefix (e.g., model.encoder.conv1.weight)
- Requires config-medium.json and tokenizer-medium.json
whisper.cpp/ directory: Use with whisper.cpp (C++ implementation)
- Tensor names without model. prefix (e.g., encoder.conv1.weight)
- Compatible with whisper.cpp CLI tools
- Both directories contain .gguf files, not .bin files

Available Formats

Format	Quality	Use Case
q2_k	mediumest	Extreme compression
q3_k	medium	Mobile devices
q4_0	Good	Legacy compatibility
q4_k	Good	Recommended for production
q4_1	Good+	Legacy with bias
q5_0	Very Good	Legacy compatibility
q5_k	Very Good	High quality
q5_1	Very Good+	Legacy with bias
q6_k	Excellent	Near-lossless
q8_0	Excellent	Minimal loss, benchmarking

Usage

With Candle (Rust)

For this model, you need to modify the example code in candle. To try whisper in candle faster and easier, it's better to use the tiny model → https://huggingface.co/oxide-lab/whisper-tiny-GGUF

Command line example:

# Run Candle Whisper with local quantized model
cargo run --example whisper --release -- \
  --features symphonia \
  --quantized \
  --model medium \
  --model-id oxide-lab/whisper-medium-GGUF \

With whisper.cpp (C++)

# Use models from whisper.cpp/ subdirectory
./whisper.cpp/build/bin/whisper-cli \
  --model models/openai/medium/whisper.cpp/whisper-medium-q4_k.gguf \
  --file audio.wav

Recommended Format

For most use cases, we recommend q4_k format as it provides the best balance of:

Size reduction (~65% mediumer)
Quality (minimal degradation)
Speed (faster inference than higher quantizations)

Quantization Details

Source Model: openai/whisper-medium
Quantization Methods:
- Candle GGUF (root directory): Python-based. Directly PyTorch → GGUF
  - Adds model. prefix to tensor names for Candle compatibility
- whisper.cpp GGML (whisper.cpp/ subdirectory): whisper-quantize tool
  - Uses original tensor names without prefix
Format: GGUF (GGML Universal Format) for both directories
Total Formats: 10 quantization levels (q2_k through q8_0)

License

Same as the original Whisper model (MIT License).

Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}