Mistral Voxtral Mini 4B Realtime INT4 NF4 Submission

This repository contains an INT4 NF4 quantized version of:

mistralai/Voxtral-Mini-4B-Realtime-2602

Compression technique

Base model: mistralai/Voxtral-Mini-4B-Realtime-2602
Quantization method: BitsAndBytes 4-bit NF4
Double quantization: enabled
Compute dtype: BF16 on A100, otherwise FP16
Architecture changes: none
Distillation: none
Fine-tuning: none

Challenge alignment

The model remains based on Voxtral Realtime. The compression approach focuses on weight quantization.

The challenge evaluation is expected to focus on ASR quality using WER, followed by energy efficiency ranking among qualifying submissions.

Local validation performed

A tiny FLEURS smoke test was run on three languages:

English: en_us
French: fr_fr
Hindi: hi_in

The smoke test used one sample per language and was intended only to verify that the quantized model loads and transcribes.

Observed smoke-test macro WER:

BF16 baseline: 0.668129
INT4 NF4: 0.650585

Because this test used only one sample per language, these numbers should not be interpreted as a final benchmark. They only indicate that the INT4 checkpoint is functional and not obviously broken.

Serving

Intended serving command:

vllm serve --config vllm_config.yaml

Important note: this checkpoint was produced as a Transformers/BitsAndBytes INT4 NF4 checkpoint. Final vLLM compatibility should be verified in the official evaluation environment.

Storage source

The clean submission folder was prepared from:

/content/mistral_voxtral_quant/voxtral-mini-4b-realtime-int4-nf4-bnb

Files expected in this repository

This repository should include:

quantized model weights
config.json
tokenizer / processor files
README.md
vllm_config.yaml
safetensors files and index files, if present

License

Same as the original base model: Apache-2.0.

Downloads last month: 13

Safetensors

Model size

5B params

Tensor type

F32

BF16

Model tree for meghanamakkapati/MistralAI_INT4_quantization

Base model

mistralai/Ministral-3-3B-Base-2512

Finetuned

mistralai/Voxtral-Mini-4B-Realtime-2602

Quantized

(23)

this model