Automatic Speech Recognition
Transformers
Safetensors
voxtral_realtime
mistral
voxtral
voxtral-realtime
asr
quantization
bitsandbytes
int4
nf4
4-bit precision
Instructions to use meghanamakkapati/MistralAI_INT4_quantization with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use meghanamakkapati/MistralAI_INT4_quantization with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="meghanamakkapati/MistralAI_INT4_quantization")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("meghanamakkapati/MistralAI_INT4_quantization") model = AutoModelForSpeechSeq2Seq.from_pretrained("meghanamakkapati/MistralAI_INT4_quantization") - Notebooks
- Google Colab
- Kaggle
Error Loading using vLLM
#1
by suleimanelkhoury - opened
Hi, running the model using vLLM returns the following error:
AttributeError: CachedMistralCommonBackend has no attribute is_fast
vLLM is also detecting the architecture falsely: Resolved architecture: TransformersMultiModalForCausalLM
as if vLLM is programmed to only detect the source mistralai/Voxtral-Mini-4B-Realtime-2602 repository. Do you encounter the same issue?