| --- |
| license: mit |
| --- |
| |
| Diffusion-based Generative Speech Source Separation |
|
|
| This repository contains the checkpoints for the diffusion based speech |
| separation model from the paper Diffusion-based Generative Speech Source |
| Separation presented at ICASSP 2023. |
|
|
| The code to run the model is available on [github](https://github.com/fakufaku/diffusion-separation). |
|
|
| ### Abstract |
|
|
| We propose DiffSep, a new single channel source separation method based on |
| score-matching of a stochastic differential equation (SDE). We craft a tailored |
| continuous time diffusion-mixing process starting from the separated sources |
| and converging to a Gaussian distribution centered on their mixture. This |
| formulation lets us apply the machinery of score-based generative modelling. |
| First, we train a neural network to approximate the score function of the |
| marginal probabilities or the diffusion-mixing process. Then, we use it to |
| solve the reverse time SDE that progressively separates the sources starting |
| from their mixture. We propose a modified training strategy to handle model |
| mismatch and source permutation ambiguity. Experiments on the WSJ0 2mix dataset |
| demonstrate the potential of the method. Furthermore, the method is also |
| suitable for speech enhancement and shows performance competitive with prior |
| work on the VoiceBank-DEMAND dataset. |
|
|
| ID: `2022-10-23_01-37-07_experiment-model-large-multigpu_model.optimizer.lr-0.0002_model.sde.d_lambda-2.0_model.sde.sigma_min-0.05_epoch-979_si_sdr-11.271_N-30_snr-0.5_corrstep-1_denoise-True_schedule-None` |
|
|