lucylq
/

voxtral

Model card Files Files and versions

voxtral / README.md

lucylq's picture

Update README.md

f9f9d38 verified 4 months ago

|

history blame contribute delete

1.79 kB

	# Voxtral-Mini-3B (ExecuTorch, XNNPACK, 8da4w)

	This folder contains an ExecuTorch .pte export of https://huggingface.co/mistralai/Voxtral-Mini-3B-2507 for CPU inference via the XNNPACK backend, with post-training quantization enabled. Voxtral is a multimodal speech-language
	model that accepts audio and text inputs.

	## Contents

	- model.pte: ExecuTorch program
	- voxtral_preprocessor.pte: Audio preprocessor (mel spectrogram extractor)

	## Quantization

	- --qlinear 8da4w: text decoder linear layers use 8-bit dynamic activations + 4-bit weights
	- --qlinear_encoder 8da4w: audio encoder linear layers use 8-bit dynamic activations + 4-bit weights
	- --qembedding 4w: embeddings use 4-bit weights

	## Export model
	```
	pip install mistral_common

	optimum-cli export executorch \
	--model "mistralai/Voxtral-Mini-3B-2507" \
	--task "multimodal-text-to-text" \
	--recipe "xnnpack" \
	--use_custom_sdpa \
	--use_custom_kv_cache \
	--max_seq_len 2048 \
	--qlinear 8da4w \
	--qlinear_encoder 8da4w \
	--qembedding 4w \
	--output_dir="voxtral"
	```
	## Export audio preprocessor (supports up to 5 min / 300s audio)
	```
	python -m executorch.extension.audio.mel_spectrogram \
	--feature_size 128 \
	--stack_output \
	--max_audio_len 300 \
	--output_file voxtral_preprocessor.pte
	```
	## Run
	Download tokenizer
	```
	curl -L https://huggingface.co/mistralai/Voxtral-Mini-3B-2507/resolve/main/tekken.json --output tekken.json
	```
	Build the runner from the ExecuTorch repo root
	```
	make voxtral-cpu
	```
	Run model
	```
	./cmake-out/examples/models/voxtral/voxtral_runner \
	--model_path "model.pte" \
	--tokenizer_path "tekken.json" \
	--prompt "What can you tell me about this audio?" \
	--audio_path "audio.wav" \
	--processor_path "voxtral_preprocessor.pte" \
	--temperature 0
	```