Mistral Voxtral Mini 4B Realtime INT4 NF4 Submission
This repository contains an INT4 NF4 quantized version of:
mistralai/Voxtral-Mini-4B-Realtime-2602
Compression technique
- Base model:
mistralai/Voxtral-Mini-4B-Realtime-2602 - Quantization method: BitsAndBytes 4-bit NF4
- Double quantization: enabled
- Compute dtype: BF16 on A100, otherwise FP16
- Architecture changes: none
- Distillation: none
- Fine-tuning: none
Challenge alignment
The model remains based on Voxtral Realtime. The compression approach focuses on weight quantization.
The challenge evaluation is expected to focus on ASR quality using WER, followed by energy efficiency ranking among qualifying submissions.
Local validation performed
A tiny FLEURS smoke test was run on three languages:
- English:
en_us - French:
fr_fr - Hindi:
hi_in
The smoke test used one sample per language and was intended only to verify that the quantized model loads and transcribes.
Observed smoke-test macro WER:
- BF16 baseline:
0.668129 - INT4 NF4:
0.650585
Because this test used only one sample per language, these numbers should not be interpreted as a final benchmark. They only indicate that the INT4 checkpoint is functional and not obviously broken.
Serving
Intended serving command:
vllm serve --config vllm_config.yaml
Important note: this checkpoint was produced as a Transformers/BitsAndBytes INT4 NF4 checkpoint. Final vLLM compatibility should be verified in the official evaluation environment.
Storage source
The clean submission folder was prepared from:
/content/mistral_voxtral_quant/voxtral-mini-4b-realtime-int4-nf4-bnb
Files expected in this repository
This repository should include:
- quantized model weights
config.json- tokenizer / processor files
README.mdvllm_config.yaml- safetensors files and index files, if present
License
Same as the original base model: Apache-2.0.
- Downloads last month
- 13
Model tree for meghanamakkapati/MistralAI_INT4_quantization
Base model
mistralai/Ministral-3-3B-Base-2512