MOSS-Music-8B-Thinking · MLX 4-bit

Base model MLX License Backend

A 4-bit MLX quantization of OpenMOSS-Team/MOSS-Music-8B-Thinking for music understanding on Apple Silicon. The smallest build (~6 GB), a good fit for 16 GB Macs.

Community conversion, not an official release. All model credit goes to the OpenMOSS Team.

Other sizes: 8-bit · 6-bit

Usage

MOSS-Music is a custom multimodal (audio + text) model, so it does not load with mlx_lm / mlx_vlm directly. Use the moss_music_mlx backend (code, PR):

from huggingface_hub import snapshot_download
from moss_music_mlx import load_pretrained, generate
from src.processing_moss_music import MossMusicProcessor

path = snapshot_download("mlx-community/MOSS-Music-8B-Thinking-4bit")
model = load_pretrained(path)
proc = MossMusicProcessor.from_pretrained(path, trust_remote_code=True, enable_time_marker=True)
print(generate(model, proc, "Analyze this track: genre, key, BPM, structure.", audio_path="song.mp3"))

Conversion

  • 4-bit, group size 64. The audio encoder is kept at bf16 to preserve audio fidelity; quantization is applied to the Qwen3 layers, token embeddings and lm_head.
  • Converted with mlx==0.31.2, mlx-lm==0.29.1.

Accuracy

Versus the fp32 PyTorch reference, the 4-bit model's prefill next-token argmax is identical and the logits match to cosine 0.99889 (8-bit is 0.99999, 6-bit 0.99989). 4-bit is the most aggressive recipe; for the highest fidelity prefer 6-bit or 8-bit.

License & credit

Apache-2.0, inherited from the base model. This repository provides only the MLX-quantized weights; all credit goes to the OpenMOSS Team.

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/MOSS-Music-8B-Thinking-4bit

Finetuned
(3)
this model