mlx-community
/

Qwen2-Audio-7B-Instruct-4bit

Audio-Text-to-Text

Model card Files Files and versions

shreyask commited on 24 days ago

Commit

c655700

·

verified ·

1 Parent(s): d57ad8a

Add model card

Files changed (1) hide show

README.md +68 -0

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+library_name: mlx
+tags:
+- mlx
+- mlx-audio
+- qwen2-audio
+- audio
+- speech
+- multimodal
+- 4bit
+base_model: Qwen/Qwen2-Audio-7B-Instruct
+license: apache-2.0
+pipeline_tag: audio-text-to-text
+---
+# Qwen2-Audio-7B-Instruct (4-bit MLX)
+4-bit quantized version of [Qwen/Qwen2-Audio-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct) for Apple Silicon via [mlx-audio](https://github.com/Blaizzy/mlx-audio).
+## Usage
+```python
+from mlx_audio.stt.utils import load_model
+model = load_model("mlx-community/Qwen2-Audio-7B-Instruct-4bit")
+# Transcription
+result = model.generate("audio.wav", prompt="Transcribe the audio.")
+print(result.text)
+# Audio understanding
+result = model.generate("audio.wav", prompt="What emotion is the speaker expressing?")
+print(result.text)
+# Translation
+result = model.generate("audio.wav", prompt="Translate the speech to French.")
+print(result.text)
+```
+## Model Details
+- **Base model**: Qwen/Qwen2-Audio-7B-Instruct
+- **Quantization**: 4-bit (group_size=64), LLM only (encoder and projector kept in bf16)
+- **Size**: ~4.2GB (vs ~15GB bf16)
+- **Architecture**: Whisper-style encoder (32 layers) + Linear projector + Qwen2-7B LLM
+## Capabilities
+- Speech transcription (ASR)
+- Speech translation
+- Audio captioning
+- Emotion / sentiment detection
+- Environmental sound classification
+- Music understanding
+- Voice chat (audio-only input)
+## Performance
+Tested on Apple Silicon (M-series):
+- ~4.7 tokens/sec generation (4-bit)
+- Accurate transcription matching HuggingFace reference
+## Conversion
+Converted using mlx-audio with:
+- Audio encoder: bf16 (not quantized)
+- Multi-modal projector: bf16 (not quantized)
+- Language model: 4-bit quantized (group_size=64)