--- license: mit language: - multilingual - en - ru tags: - whisper - gguf - quantized - speech-recognition - rust - candle base_model: - openai/whisper-tiny pipeline_tag: automatic-speech-recognition --- # WHISPER-TINY - GGUF Quantized Models Quantized versions of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) in GGUF format. ## Directory Structure ``` tiny/ ├── whisper-tiny-q*.gguf # Candle-compatible GGUF models (root) ├── model-tiny-q80.gguf # Candle-compatible legacy naming (q8_0 format) ├── config-tiny.json # Model configuration for Candle ├── tokenizer-tiny.json # Tokenizer for Candle └── whisper.cpp/ # whisper.cpp-compatible models └── whisper-tiny-q*.gguf ``` ### Format Compatibility - **Root directory** (`whisper-tiny-*.gguf`): Use with **Candle** (Rust ML framework) - Tensor names include `model.` prefix (e.g., `model.encoder.conv1.weight`) - Requires `config-tiny.json` and `tokenizer-tiny.json` - **whisper.cpp/** directory: Use with **whisper.cpp** (C++ implementation) - Tensor names without `model.` prefix (e.g., `encoder.conv1.weight`) - Compatible with whisper.cpp CLI tools - Both directories contain `.gguf` files, not `.bin` files ## Available Formats | Format | Quality | Use Case | |--------| ---------|----------| | q2_k | Smallest | Extreme compression | | q3_k | Small | Mobile devices | | q4_0 | Good | Legacy compatibility | | q4_k | Good | **Recommended for production** | | q4_1 | Good+ | Legacy with bias | | q5_0 | Very Good | Legacy compatibility | | q5_k | Very Good | High quality | | q5_1 | Very Good+ | Legacy with bias | | q6_k | Excellent | Near-lossless | | q8_0 | Excellent | Minimal loss, benchmarking | ## Usage ### With Candle (Rust) **Command line example:** ```bash # Run Candle Whisper with local quantized model cargo run --example whisper --release -- \ --features symphonia \ --quantized \ --model tiny \ --model-id oxide-lab/whisper-tiny-GGUF ``` ### With whisper.cpp (C++) ```bash # Use models from whisper.cpp/ subdirectory ./whisper.cpp/build/bin/whisper-cli \ --model models/openai/tiny/whisper.cpp/whisper-tiny-q4_k.gguf \ --file audio.wav ``` ### Recommended Format For most use cases, we recommend **q4_k** format as it provides the best balance of: - Size reduction (~65% smaller) - Quality (minimal degradation) - Speed (faster inference than higher quantizations) ## Quantization Details - **Source Model**: [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) - **Quantization Methods**: - **Candle GGUF** (root directory): Python-based quantization. Directly PyTorch → GGUF - Adds `model.` prefix to tensor names for Candle compatibility - **whisper.cpp GGML** (whisper.cpp/ subdirectory): whisper-quantize tool - Uses original tensor names without prefix - **Format**: GGUF (GGML Universal Format) for both directories - **Total Formats**: 10 quantization levels (q2_k through q8_0) ## License Same as the original Whisper model (MIT License). ## Citation ```bibtex @misc{radford2022whisper, doi = {10.48550/ARXIV.2212.04356}, url = {https://arxiv.org/abs/2212.04356}, author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya}, title = {Robust Speech Recognition via Large-Scale Weak Supervision}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license} } ```