---
license: mit
language:
- multilingual
- en
- ru
tags:
- whisper
- gguf
- quantized
- speech-recognition
- rust
- candle
base_model:
- openai/whisper-tiny
pipeline_tag: automatic-speech-recognition
---

# WHISPER-TINY - GGUF Quantized Models

Quantized versions of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) in GGUF format.

## Directory Structure

```
tiny/
├── whisper-tiny-q*.gguf       # Candle-compatible GGUF models (root)
├── model-tiny-q80.gguf        # Candle-compatible legacy naming (q8_0 format)
├── config-tiny.json           # Model configuration for Candle
├── tokenizer-tiny.json        # Tokenizer for Candle
└── whisper.cpp/               # whisper.cpp-compatible models
    └── whisper-tiny-q*.gguf

```

### Format Compatibility

- **Root directory** (`whisper-tiny-*.gguf`): Use with **Candle** (Rust ML framework)
  - Tensor names include `model.` prefix (e.g., `model.encoder.conv1.weight`)
  - Requires `config-tiny.json` and `tokenizer-tiny.json`
  
- **whisper.cpp/** directory: Use with **whisper.cpp** (C++ implementation)
  - Tensor names without `model.` prefix (e.g., `encoder.conv1.weight`)
  - Compatible with whisper.cpp CLI tools
  - Both directories contain `.gguf` files, not `.bin` files

## Available Formats

| Format |  Quality | Use Case |
|--------| ---------|----------|
| q2_k |   Smallest | Extreme compression |
| q3_k |  Small | Mobile devices |
| q4_0 |  Good | Legacy compatibility |
| q4_k |  Good | **Recommended for production** |
| q4_1 | Good+ | Legacy with bias |
| q5_0 |  Very Good | Legacy compatibility |
| q5_k |  Very Good | High quality |
| q5_1 |  Very Good+ | Legacy with bias |
| q6_k |   Excellent | Near-lossless |
| q8_0 |   Excellent | Minimal loss, benchmarking |

## Usage

### With Candle (Rust)

**Command line example:**
```bash
# Run Candle Whisper with local quantized model
cargo run --example whisper --release -- \
  --features symphonia \
  --quantized \
  --model tiny \
  --model-id oxide-lab/whisper-tiny-GGUF 
```

### With whisper.cpp (C++)

```bash
# Use models from whisper.cpp/ subdirectory
./whisper.cpp/build/bin/whisper-cli \
  --model models/openai/tiny/whisper.cpp/whisper-tiny-q4_k.gguf \
  --file audio.wav
```

### Recommended Format

For most use cases, we recommend **q4_k** format as it provides the best balance of:
- Size reduction (~65% smaller)
- Quality (minimal degradation)
- Speed (faster inference than higher quantizations)

## Quantization Details

- **Source Model**: [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)
- **Quantization Methods**:
  - **Candle GGUF** (root directory): Python-based quantization. Directly PyTorch → GGUF
    - Adds `model.` prefix to tensor names for Candle compatibility
  - **whisper.cpp GGML** (whisper.cpp/ subdirectory): whisper-quantize tool
    - Uses original tensor names without prefix
- **Format**: GGUF (GGML Universal Format) for both directories
- **Total Formats**: 10 quantization levels (q2_k through q8_0)

## License

Same as the original Whisper model (MIT License).

## Citation

```bibtex
@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
```