Mistral Voxtral Mini 4B Realtime INT4 NF4 Submission

This repository contains an INT4 NF4 quantized version of:

mistralai/Voxtral-Mini-4B-Realtime-2602

Compression technique

  • Base model: mistralai/Voxtral-Mini-4B-Realtime-2602
  • Quantization method: BitsAndBytes 4-bit NF4
  • Double quantization: enabled
  • Compute dtype: BF16 on A100, otherwise FP16
  • Architecture changes: none
  • Distillation: none
  • Fine-tuning: none

Challenge alignment

The model remains based on Voxtral Realtime. The compression approach focuses on weight quantization.

The challenge evaluation is expected to focus on ASR quality using WER, followed by energy efficiency ranking among qualifying submissions.

Local validation performed

A tiny FLEURS smoke test was run on three languages:

  • English: en_us
  • French: fr_fr
  • Hindi: hi_in

The smoke test used one sample per language and was intended only to verify that the quantized model loads and transcribes.

Observed smoke-test macro WER:

  • BF16 baseline: 0.668129
  • INT4 NF4: 0.650585

Because this test used only one sample per language, these numbers should not be interpreted as a final benchmark. They only indicate that the INT4 checkpoint is functional and not obviously broken.

Serving

Intended serving command:

vllm serve --config vllm_config.yaml

Important note: this checkpoint was produced as a Transformers/BitsAndBytes INT4 NF4 checkpoint. Final vLLM compatibility should be verified in the official evaluation environment.

Storage source

The clean submission folder was prepared from:

/content/mistral_voxtral_quant/voxtral-mini-4b-realtime-int4-nf4-bnb

Files expected in this repository

This repository should include:

  • quantized model weights
  • config.json
  • tokenizer / processor files
  • README.md
  • vllm_config.yaml
  • safetensors files and index files, if present

License

Same as the original base model: Apache-2.0.

Downloads last month
13
Safetensors
Model size
5B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for meghanamakkapati/MistralAI_INT4_quantization