RE-USE SEMamba Speech Enhancement (MLX)

Pure-MLX conversion of NVIDIA RE-USE, a ~9.6M-parameter SEMamba universal speech-enhancement model. In mlx-speech it cleans a voice reference before VAE conditioning when DramaBox TTS runs with denoise_ref=True, giving the cloning model a clean speaker anchor.

Non-commercial weights. These weights derive from NVIDIA RE-USE, licensed under the NVIDIA Source Code License (non-commercial). See the License section.

Model Details

Developed by: App Automaton
Upstream model: nvidia/RE-USE (SEMamba, bidirectional Mamba over STFT magnitude + phase)
Role: input-side voice-reference denoiser for DramaBox denoise_ref=True. Optional, off by default.
Conversion: format-only port of the fp32 weights to MLX .safetensors (1416 keys, ~9.6M params). No quantization, no architecture change.
Runtime: pure MLX on Apple Silicon. The selective scan mirrors the mamba_ssm selective_scan_ref reference math, so no CUDA kernels (mamba-ssm / causal-conv1d) are required.
Parity: the MLX port matches the torch reference at amplitude-weighted complex correlation 0.9998 (model) and 0.9997 (end-to-end waveform on real speech).

File	Component	Format	Size
`model.safetensors`	SEMamba enhancer	fp32	~38 MB
`config.json`	Model + STFT config	JSON	n/a

How to Get Started

Used automatically by DramaBox when you opt in:

import mlx_speech

tts = mlx_speech.tts.load("dramabox")
result = tts.generate(
    "Voice cloning from a noisy reference.",
    reference_audio="noisy_speaker.wav",
    denoise_ref=True,   # cleans the reference with this model first
)

tts.load("dramabox") resolves these weights automatically. To run the enhancer directly:

hf download appautomaton/re-use-semamba-mlx --local-dir models/reuse/mlx

from pathlib import Path
from mlx_speech.generation.reuse import REUSEEnhancer

enhancer = REUSEEnhancer.from_dir(Path("models/reuse/mlx"))
clean = enhancer.enhance(noisy_waveform, in_sr=16000)  # mono in, mono out

Intended Use

Denoising a short voice-reference clip before voice cloning, so the model conditions on a clean speaker/style anchor rather than the recording's noise. The enhancer runs on the reference input, never on generated output, so the TTS model's paralinguistic events (breaths, laughs) are preserved.

License

NVIDIA Source Code License (non-commercial). These weights are a format conversion of nvidia/RE-USE and remain governed by NVIDIA's license terms; by downloading or using them you agree to those terms. They may not be used commercially. Set denoise_ref=False (the default) to run DramaBox voice cloning without this model. The mlx-speech runtime code is MIT.

Downloads last month: 30

Safetensors

Model size

9.61M params

Tensor type

F32

MLX

Hardware compatibility

Quantized

Model tree for appautomaton/re-use-semamba-mlx

Base model

nvidia/RE-USE

Finetuned

(2)

this model

appautomaton
/

re-use-semamba-mlx