meghanamakkapati
/

MistralAI_INT4_quantization

Automatic Speech Recognition

voxtral_realtime

voxtral-realtime

4-bit precision

Model card Files Files and versions

MistralAI_INT4_quantization / README.md

meghanamakkapati's picture

meghanamakkapati

Add files using upload-large-folder tool

0022f1c verified 17 days ago

|

history blame contribute delete

2.21 kB

	---
	base_model: mistralai/Voxtral-Mini-4B-Realtime-2602
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	license: apache-2.0
	tags:
	- mistral
	- voxtral
	- voxtral-realtime
	- automatic-speech-recognition
	- asr
	- quantization
	- bitsandbytes
	- int4
	- nf4
	---

	# Mistral Voxtral Mini 4B Realtime INT4 NF4 Submission

	This repository contains an INT4 NF4 quantized version of:

	`mistralai/Voxtral-Mini-4B-Realtime-2602`

	## Compression technique

	- Base model: `mistralai/Voxtral-Mini-4B-Realtime-2602`
	- Quantization method: BitsAndBytes 4-bit NF4
	- Double quantization: enabled
	- Compute dtype: BF16 on A100, otherwise FP16
	- Architecture changes: none
	- Distillation: none
	- Fine-tuning: none

	## Challenge alignment

	The model remains based on Voxtral Realtime. The compression approach focuses on weight quantization.

	The challenge evaluation is expected to focus on ASR quality using WER, followed by energy efficiency ranking among qualifying submissions.

	## Local validation performed

	A tiny FLEURS smoke test was run on three languages:

	- English: `en_us`
	- French: `fr_fr`
	- Hindi: `hi_in`

	The smoke test used one sample per language and was intended only to verify that the quantized model loads and transcribes.

	Observed smoke-test macro WER:

	- BF16 baseline: `0.668129`
	- INT4 NF4: `0.650585`

	Because this test used only one sample per language, these numbers should not be interpreted as a final benchmark. They only indicate that the INT4 checkpoint is functional and not obviously broken.

	## Serving

	Intended serving command:

	`vllm serve --config vllm_config.yaml`

	Important note: this checkpoint was produced as a Transformers/BitsAndBytes INT4 NF4 checkpoint. Final vLLM compatibility should be verified in the official evaluation environment.

	## Storage source

	The clean submission folder was prepared from:

	`/content/mistral_voxtral_quant/voxtral-mini-4b-realtime-int4-nf4-bnb`

	## Files expected in this repository

	This repository should include:

	- quantized model weights
	- `config.json`
	- tokenizer / processor files
	- `README.md`
	- `vllm_config.yaml`
	- safetensors files and index files, if present

	## License

	Same as the original base model: Apache-2.0.