Whisper Small — MLX 4-bit Quantized

Custom MLX 4-bit quantization of OpenAI Whisper Small optimized for MetalRT GPU inference on Apple Silicon.

Usage

Used by RCLI with the MetalRT engine for speech-to-text:

rcli setup          # select MetalRT or Both engines

Note: Whisper Small is in GPU beta. Whisper Tiny is recommended for production use.

Model weights: MIT (OpenAI) MetalRT engine: Proprietary (RunAnywhere, Inc.)

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support