mlx-community/MOSS-SoundEffect-v2.0-bf16

This model mlx-community/MOSS-SoundEffect-v2.0-bf16 was converted to MLX format from OpenMOSS-Team/MOSS-SoundEffect-v2.0 โ€” a text-to-sound-effect diffusion pipeline (foley / ambience / creature / action audio, 48 kHz, up to 30 s) with a 1.3B Wan-style flow-matching DiT, a continuous 128-d DAC VAE (50 Hz latents), and a frozen Qwen3-1.7B text encoder.

Precision: DiT bf16, DAC-VAE fp32 (the reference decodes under fp32 autocast), Qwen3 text encoder bf16.

Use with mlx

pip install moss-sfx-mlx  # https://github.com/xocialize/moss-soundeffect-mlx
from moss_sfx_mlx.pipeline_mlx import MossSoundEffectPipeline

pipe = MossSoundEffectPipeline.from_pretrained("mlx-community/MOSS-SoundEffect-v2.0-bf16")
audio = pipe(prompt="a heavy wooden door creaks open slowly",
             seconds=5, num_inference_steps=100, cfg_scale=4.0, seed=0)
# audio: (1, 1, samples) mx.array at 48 kHz

Parity

Validated against the upstream PyTorch reference (fp32, CPU stream, per-module and end-to-end golden tensors; full suite in the GitHub repo):

  • End-to-end waveform vs PyTorch golden (10-step CFG denoise): max_abs < 1e-2 fp32

  • Full-DiT velocity field at production scale (T=1500): max_abs < 1e-2 fp32

  • DAC-VAE decode vs reference: max_abs < 1e-2 fp32 (no scale constant โ€” the learned post_quant_conv is faithful)

  • Qwen3 hidden states: cosine 1.0, max_abs 4.4e-4 (fp32 accumulation floor)

  • 10-prompt perceptual A/B at 100 steps: passed human review (correct content, duration, no tonal artifacts)

Performance (Apple M5 Max)

100 steps, cfg 4.0, full 30 s latent: 60 s wall clock, 14.2 GB peak memory.

License

Apache-2.0, matching the upstream model, code, and all components.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/MOSS-SoundEffect-v2.0-bf16

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(2)
this model