Instructions to use mlx-community/MOSS-Music-8B-Thinking-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/MOSS-Music-8B-Thinking-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir MOSS-Music-8B-Thinking-4bit mlx-community/MOSS-Music-8B-Thinking-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
MOSS-Music-8B-Thinking · MLX 4-bit
A 4-bit MLX quantization of OpenMOSS-Team/MOSS-Music-8B-Thinking for music understanding on Apple Silicon. The smallest build (~6 GB), a good fit for 16 GB Macs.
Community conversion, not an official release. All model credit goes to the OpenMOSS Team.
Usage
MOSS-Music is a custom multimodal (audio + text) model, so it does not load with
mlx_lm / mlx_vlm directly. Use the moss_music_mlx backend
(code, PR):
from huggingface_hub import snapshot_download
from moss_music_mlx import load_pretrained, generate
from src.processing_moss_music import MossMusicProcessor
path = snapshot_download("mlx-community/MOSS-Music-8B-Thinking-4bit")
model = load_pretrained(path)
proc = MossMusicProcessor.from_pretrained(path, trust_remote_code=True, enable_time_marker=True)
print(generate(model, proc, "Analyze this track: genre, key, BPM, structure.", audio_path="song.mp3"))
Conversion
- 4-bit, group size 64. The audio encoder is kept at bf16 to preserve audio
fidelity; quantization is applied to the Qwen3 layers, token embeddings and
lm_head. - Converted with
mlx==0.31.2,mlx-lm==0.29.1.
Accuracy
Versus the fp32 PyTorch reference, the 4-bit model's prefill next-token argmax is identical and the logits match to cosine 0.99889 (8-bit is 0.99999, 6-bit 0.99989). 4-bit is the most aggressive recipe; for the highest fidelity prefer 6-bit or 8-bit.
License & credit
Apache-2.0, inherited from the base model. This repository provides only the MLX-quantized weights; all credit goes to the OpenMOSS Team.
- Downloads last month
- -
Quantized
Model tree for mlx-community/MOSS-Music-8B-Thinking-4bit
Base model
OpenMOSS-Team/MOSS-Music-8B-Thinking