oxide-lab
/

whisper-tiny-GGUF

Automatic Speech Recognition

speech-recognition

Model card Files Files and versions

whisper-tiny-GGUF / README.md

FerrisMind's picture

Update README.md

94468a6 verified 28 days ago

|

history blame contribute delete

3.54 kB

	---
	license: mit
	language:
	- multilingual
	- en
	- ru
	tags:
	- whisper
	- gguf
	- quantized
	- speech-recognition
	- rust
	- candle
	base_model:
	- openai/whisper-tiny
	pipeline_tag: automatic-speech-recognition
	---

	# WHISPER-TINY - GGUF Quantized Models

	Quantized versions of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) in GGUF format.

	## Directory Structure

	```
	tiny/
	├── whisper-tiny-q*.gguf # Candle-compatible GGUF models (root)
	├── model-tiny-q80.gguf # Candle-compatible legacy naming (q8_0 format)
	├── config-tiny.json # Model configuration for Candle
	├── tokenizer-tiny.json # Tokenizer for Candle
	└── whisper.cpp/ # whisper.cpp-compatible models
	└── whisper-tiny-q*.gguf

	```

	### Format Compatibility

	- Root directory (`whisper-tiny-.gguf`): Use with Candle* (Rust ML framework)
	- Tensor names include `model.` prefix (e.g., `model.encoder.conv1.weight`)
	- Requires `config-tiny.json` and `tokenizer-tiny.json`

	- whisper.cpp/ directory: Use with whisper.cpp (C++ implementation)
	- Tensor names without `model.` prefix (e.g., `encoder.conv1.weight`)
	- Compatible with whisper.cpp CLI tools
	- Both directories contain `.gguf` files, not `.bin` files

	## Available Formats

	\| Format \| Quality \| Use Case \|
	\|--------\| ---------\|----------\|
	\| q2_k \| Smallest \| Extreme compression \|
	\| q3_k \| Small \| Mobile devices \|
	\| q4_0 \| Good \| Legacy compatibility \|
	\| q4_k \| Good \| Recommended for production \|
	\| q4_1 \| Good+ \| Legacy with bias \|
	\| q5_0 \| Very Good \| Legacy compatibility \|
	\| q5_k \| Very Good \| High quality \|
	\| q5_1 \| Very Good+ \| Legacy with bias \|
	\| q6_k \| Excellent \| Near-lossless \|
	\| q8_0 \| Excellent \| Minimal loss, benchmarking \|

	## Usage

	### With Candle (Rust)

	Command line example:
	```bash
	# Run Candle Whisper with local quantized model
	cargo run --example whisper --release -- \
	--features symphonia \
	--quantized \
	--model tiny \
	--model-id oxide-lab/whisper-tiny-GGUF
	```

	### With whisper.cpp (C++)

	```bash
	# Use models from whisper.cpp/ subdirectory
	./whisper.cpp/build/bin/whisper-cli \
	--model models/openai/tiny/whisper.cpp/whisper-tiny-q4_k.gguf \
	--file audio.wav
	```

	### Recommended Format

	For most use cases, we recommend q4_k format as it provides the best balance of:
	- Size reduction (~65% smaller)
	- Quality (minimal degradation)
	- Speed (faster inference than higher quantizations)

	## Quantization Details

	- Source Model: [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)
	- Quantization Methods:
	- Candle GGUF (root directory): Python-based quantization. Directly PyTorch → GGUF
	- Adds `model.` prefix to tensor names for Candle compatibility
	- whisper.cpp GGML (whisper.cpp/ subdirectory): whisper-quantize tool
	- Uses original tensor names without prefix
	- Format: GGUF (GGML Universal Format) for both directories
	- Total Formats: 10 quantization levels (q2_k through q8_0)

	## License

	Same as the original Whisper model (MIT License).

	## Citation

	```bibtex
	@misc{radford2022whisper,
	doi = {10.48550/ARXIV.2212.04356},
	url = {https://arxiv.org/abs/2212.04356},
	author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
	title = {Robust Speech Recognition via Large-Scale Weak Supervision},
	publisher = {arXiv},
	year = {2022},
	copyright = {arXiv.org perpetual, non-exclusive license}
	}
	```