Voxtral-Mini-4B-Realtime-2602-Quant (ExecuTorch)
This repo contains a quantized version of Voxtral-Mini-4B-Realtime-2602 and allows for fully local, low-latency realtime transcription on your MacBook.
This 4-bit quantized checkpoint is only compatible with macOS and the M-Series computers. To run Voxtral-Mini-4B-Realtime on different hardware, please follow the installation guidelines here.
Installation
Let's first install executorch from source, enabling the Metal backend.
Git clone
export EXECUTORCH_PATH="$HOME/executorch"
git clone https://github.com/pytorch/executorch/ ${EXECUTORCH_PATH}
Installation with Metal backend
cd ${EXECUTORCH_PATH} && EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh
We recommend installing in a new conda or venv environment. If you run into any installation problems, open an issue or have a look at the official Voxtral Realtime installation guide
Build ExecuTorch with Metal backend support
cd ${EXECUTORCH_PATH} && make voxtral_realtime-metal
Having run this command you should have a new folder:
export CMAKE_RUNNER="${EXECUTORCH_PATH}/cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner"
ls ${CMAKE_RUNNER}
If you run into any installation problems, open an issue or have a look at the official Voxtral Realtime installation guide
Additional
Also make sure that you have libomp installed and exported:
brew install libomp
export DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib
and that sounddevice is installed so that you can record microphone input:
pip install sounddevice
Download model
Make sure to download:
- the
preprocessor.pteprocessor file - the
tekken.jsonoriginal tokenizer file - the
model-metal-int4.ptequantized model file
export LOCAL_FOLDER="$HOME/voxtral_realtime_quant_metal"
hf download mistralai/Voxtral-Mini-4B-Realtime-2602-Executorch --local-dir ${LOCAL_FOLDER}
Run with local microphone
Let's make use of the downloaded stream_audio.py script to stream audio from
our MacBook's microphone our voxtral realtime runner.
Make sure it is executable:
cd ${LOCAL_FOLDER} && chmod +x stream_audio.py
Now we can stream and transcribe audio fully locally with very low latency / delay.
Now cd into our downloaded folder and transcribe your microphone input:
cd ${LOCAL_FOLDER} &&
./stream_audio.py | \
${CMAKE_RUNNER} \
--model_path ./model-metal-int4.pte \
--tokenizer_path ./tekken.json \
--preprocessor_path ./preprocessor.pte \
--mic