Regularized Schrödinger Bridge (RSB) via Distortion-Perception Perturbation for High-Fidelity Speech Enhancement

Regularized Schrödinger Bridge (RSB) is a generative speech enhancement approach that reconciles fidelity and realism while mitigating exposure bias. RSB regularizes training with a Distortion-Perception Perturbation that constructs time-varying targets by interpolating between clean speech and posterior-mean estimates, and trains the network on perturbed intermediate states to correct toward the ground truth progressively. Consequently, such perturbation simulates inference-time prediction errors, mitigating the training–inference mismatch and thereby reducing exposure bias. Furthermore, it also injects posterior-mean estimates as fidelity-preserving guidance, facilitating reconstruction fidelity.

Official PyTorch implementation of the paper:
Regularized Schrödinger Bridge via Distortion-Perception Perturbation for High-Fidelity Speech Enhancement
Links: Paper | Audio Demo | Online Demo | Github | Huggingface

Pretrained Model Download

We have publicly released a checkpoint of MISB's generative model, which is based the ncsnpp_base architecture and was trained on the Voicebank+Demand dataset.

There are two ways to download:

Download via CLI

python -m cli.download_pretrained_model

Download via Google Drive. Download the folder from Google Drive and place it in the pretrained_models/ directory.

License

This project is licensed under the Apache License 2.0.

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

27.8M params

Tensor type

F32