license: apache-2.0
language:
- en
pipeline_tag: audio-to-audio
Regularized Schrödinger Bridge (RSB) via Distortion-Perception Perturbation for High-Fidelity Speech Enhancement
Regularized Schrödinger Bridge (RSB) is a generative speech enhancement approach that reconciles fidelity and realism while mitigating exposure bias. RSB regularizes training with a Distortion-Perception Perturbation that constructs time-varying targets by interpolating between clean speech and posterior-mean estimates, and trains the network on perturbed intermediate states to correct toward the ground truth progressively. Consequently, such perturbation simulates inference-time prediction errors, mitigating the training–inference mismatch and thereby reducing exposure bias. Furthermore, it also injects posterior-mean estimates as fidelity-preserving guidance, facilitating reconstruction fidelity.
- Official PyTorch implementation of the paper:
Regularized Schrödinger Bridge via Distortion-Perception Perturbation for High-Fidelity Speech Enhancement - Links: Paper | Audio Demo | Online Demo | Github | Huggingface
Pretrained Model Download
We have publicly released a checkpoint of MISB's generative model, which is based the ncsnpp_base architecture and was trained on the Voicebank+Demand dataset.
There are two ways to download:
- Download via
CLI
python -m cli.download_pretrained_model
- Download via
Google Drive. Download the folder from Google Drive and place it in thepretrained_models/directory.
License
This project is licensed under the Apache License 2.0.