Regularized Schrödinger Bridge (RSB) via Distortion-Perception Perturbation for High-Fidelity Speech Enhancement

License

Regularized Schrödinger Bridge (RSB) is a generative speech enhancement approach that reconciles fidelity and realism while mitigating exposure bias. RSB regularizes training with a Distortion-Perception Perturbation that constructs time-varying targets by interpolating between clean speech and posterior-mean estimates, and trains the network on perturbed intermediate states to correct toward the ground truth progressively. Consequently, such perturbation simulates inference-time prediction errors, mitigating the training–inference mismatch and thereby reducing exposure bias. Furthermore, it also injects posterior-mean estimates as fidelity-preserving guidance, facilitating reconstruction fidelity.

Pretrained Model Download

We have publicly released a checkpoint of MISB's generative model, which is based the ncsnpp_base architecture and was trained on the Voicebank+Demand dataset.

There are two ways to download:

  • Download via CLI
python -m cli.download_pretrained_model
  • Download via Google Drive. Download the folder from Google Drive and place it in the pretrained_models/ directory.

License

This project is licensed under the Apache License 2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support