Automatic Speech Recognition
Transformers
Safetensors
voxtral_realtime
mistral
voxtral
voxtral-realtime
asr
quantization
bitsandbytes
int4
nf4
4-bit precision
Instructions to use meghanamakkapati/MistralAI_INT4_quantization with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use meghanamakkapati/MistralAI_INT4_quantization with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="meghanamakkapati/MistralAI_INT4_quantization")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("meghanamakkapati/MistralAI_INT4_quantization") model = AutoModelForSpeechSeq2Seq.from_pretrained("meghanamakkapati/MistralAI_INT4_quantization") - Notebooks
- Google Colab
- Kaggle
| base_model: mistralai/Voxtral-Mini-4B-Realtime-2602 | |
| library_name: transformers | |
| pipeline_tag: automatic-speech-recognition | |
| license: apache-2.0 | |
| tags: | |
| - mistral | |
| - voxtral | |
| - voxtral-realtime | |
| - automatic-speech-recognition | |
| - asr | |
| - quantization | |
| - bitsandbytes | |
| - int4 | |
| - nf4 | |
| # Mistral Voxtral Mini 4B Realtime INT4 NF4 Submission | |
| This repository contains an INT4 NF4 quantized version of: | |
| `mistralai/Voxtral-Mini-4B-Realtime-2602` | |
| ## Compression technique | |
| - Base model: `mistralai/Voxtral-Mini-4B-Realtime-2602` | |
| - Quantization method: BitsAndBytes 4-bit NF4 | |
| - Double quantization: enabled | |
| - Compute dtype: BF16 on A100, otherwise FP16 | |
| - Architecture changes: none | |
| - Distillation: none | |
| - Fine-tuning: none | |
| ## Challenge alignment | |
| The model remains based on Voxtral Realtime. The compression approach focuses on weight quantization. | |
| The challenge evaluation is expected to focus on ASR quality using WER, followed by energy efficiency ranking among qualifying submissions. | |
| ## Local validation performed | |
| A tiny FLEURS smoke test was run on three languages: | |
| - English: `en_us` | |
| - French: `fr_fr` | |
| - Hindi: `hi_in` | |
| The smoke test used one sample per language and was intended only to verify that the quantized model loads and transcribes. | |
| Observed smoke-test macro WER: | |
| - BF16 baseline: `0.668129` | |
| - INT4 NF4: `0.650585` | |
| Because this test used only one sample per language, these numbers should not be interpreted as a final benchmark. They only indicate that the INT4 checkpoint is functional and not obviously broken. | |
| ## Serving | |
| Intended serving command: | |
| `vllm serve --config vllm_config.yaml` | |
| Important note: this checkpoint was produced as a Transformers/BitsAndBytes INT4 NF4 checkpoint. Final vLLM compatibility should be verified in the official evaluation environment. | |
| ## Storage source | |
| The clean submission folder was prepared from: | |
| `/content/mistral_voxtral_quant/voxtral-mini-4b-realtime-int4-nf4-bnb` | |
| ## Files expected in this repository | |
| This repository should include: | |
| - quantized model weights | |
| - `config.json` | |
| - tokenizer / processor files | |
| - `README.md` | |
| - `vllm_config.yaml` | |
| - safetensors files and index files, if present | |
| ## License | |
| Same as the original base model: Apache-2.0. | |