Voxtral-Mini-4B-Realtime-2602-Quant (ExecuTorch)

This repo contains a quantized version of Voxtral-Mini-4B-Realtime-2602 and allows for fully local, low-latency realtime transcription on your MacBook.

This 4-bit quantized checkpoint is only compatible with macOS and the M-Series computers. To run Voxtral-Mini-4B-Realtime on different hardware, please follow the installation guidelines here.

Installation

Let's first install executorch from source, enabling the Metal backend.

Git clone

export EXECUTORCH_PATH="$HOME/executorch"
git clone https://github.com/pytorch/executorch/ ${EXECUTORCH_PATH}

Installation with Metal backend

cd ${EXECUTORCH_PATH} && EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh

We recommend installing in a new conda or venv environment. If you run into any installation problems, open an issue or have a look at the official Voxtral Realtime installation guide

Build ExecuTorch with Metal backend support

cd ${EXECUTORCH_PATH} && make voxtral_realtime-metal

Having run this command you should have a new folder:

export CMAKE_RUNNER="${EXECUTORCH_PATH}/cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner"
ls ${CMAKE_RUNNER}

If you run into any installation problems, open an issue or have a look at the official Voxtral Realtime installation guide

Additional

Also make sure that you have libomp installed and exported:

brew install libomp
export DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib

and that sounddevice is installed so that you can record microphone input:

pip install sounddevice

Download model

Make sure to download:

export LOCAL_FOLDER="$HOME/voxtral_realtime_quant_metal"

hf download mistralai/Voxtral-Mini-4B-Realtime-2602-Executorch --local-dir ${LOCAL_FOLDER}

Run with local microphone

Let's make use of the downloaded stream_audio.py script to stream audio from our MacBook's microphone our voxtral realtime runner.

Make sure it is executable:

cd ${LOCAL_FOLDER} && chmod +x stream_audio.py

Now we can stream and transcribe audio fully locally with very low latency / delay.

Now cd into our downloaded folder and transcribe your microphone input:

cd ${LOCAL_FOLDER} &&
./stream_audio.py | \
   ${CMAKE_RUNNER} \
    --model_path ./model-metal-int4.pte \
    --tokenizer_path ./tekken.json \
    --preprocessor_path ./preprocessor.pte \
    --mic

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support