RE-USE SEMamba Speech Enhancement (MLX)

GitHub App Automaton DramaBox TTS

Pure-MLX conversion of NVIDIA RE-USE, a ~9.6M-parameter SEMamba universal speech-enhancement model. In mlx-speech it cleans a voice reference before VAE conditioning when DramaBox TTS runs with denoise_ref=True, giving the cloning model a clean speaker anchor.

Non-commercial weights. These weights derive from NVIDIA RE-USE, licensed under the NVIDIA Source Code License (non-commercial). See the License section.

Model Details

  • Developed by: App Automaton
  • Upstream model: nvidia/RE-USE (SEMamba, bidirectional Mamba over STFT magnitude + phase)
  • Role: input-side voice-reference denoiser for DramaBox denoise_ref=True. Optional, off by default.
  • Conversion: format-only port of the fp32 weights to MLX .safetensors (1416 keys, ~9.6M params). No quantization, no architecture change.
  • Runtime: pure MLX on Apple Silicon. The selective scan mirrors the mamba_ssm selective_scan_ref reference math, so no CUDA kernels (mamba-ssm / causal-conv1d) are required.
  • Parity: the MLX port matches the torch reference at amplitude-weighted complex correlation 0.9998 (model) and 0.9997 (end-to-end waveform on real speech).

Contents

File Component Format Size
model.safetensors SEMamba enhancer fp32 ~38 MB
config.json Model + STFT config JSON n/a

How to Get Started

Used automatically by DramaBox when you opt in:

import mlx_speech

tts = mlx_speech.tts.load("dramabox")
result = tts.generate(
    "Voice cloning from a noisy reference.",
    reference_audio="noisy_speaker.wav",
    denoise_ref=True,   # cleans the reference with this model first
)

tts.load("dramabox") resolves these weights automatically. To run the enhancer directly:

hf download appautomaton/re-use-semamba-mlx --local-dir models/reuse/mlx
from pathlib import Path
from mlx_speech.generation.reuse import REUSEEnhancer

enhancer = REUSEEnhancer.from_dir(Path("models/reuse/mlx"))
clean = enhancer.enhance(noisy_waveform, in_sr=16000)  # mono in, mono out

Intended Use

Denoising a short voice-reference clip before voice cloning, so the model conditions on a clean speaker/style anchor rather than the recording's noise. The enhancer runs on the reference input, never on generated output, so the TTS model's paralinguistic events (breaths, laughs) are preserved.

Links

License

NVIDIA Source Code License (non-commercial). These weights are a format conversion of nvidia/RE-USE and remain governed by NVIDIA's license terms; by downloading or using them you agree to those terms. They may not be used commercially. Set denoise_ref=False (the default) to run DramaBox voice cloning without this model. The mlx-speech runtime code is MIT.

Downloads last month
30
Safetensors
Model size
9.61M params
Tensor type
F32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for appautomaton/re-use-semamba-mlx

Base model

nvidia/RE-USE
Finetuned
(2)
this model