File size: 1,955 Bytes
815e31f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# source_separation

### input

- Noisy speech (audio file) 

```
Audio from creative commons youtube videos
https://drive.google.com/drive/folders/19Sn6pe5-BtWXYa6OiLbYGH7iCU-mzB8j
doublenoble_k7rain_part.wav
(Original video : https://www.youtube.com/watch?v=vsjB1xTwZ20&t=536s)
```

- Music (audio file)
```
DSD100 dataset
https://sigsep.github.io/datasets/dsd100.html
049 - Young Griffo - Facade.wav
```

### output

Separated voice (audio file)
```
separated_voice.wav
```

### Usage
Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.

For the sample audio file,
```bash
$ python3 unet_source_separation.py

```

If you want to specify the input audio file, put the input path after the --input option.
You can use --savepath option to change the name of the output file to save.
```bash
$ python3 unet_source_separation.py --input WAV_PATH --savepath SAVE_WAV_PATH
```

You can select a pretrained model by specifying --arch base (default) or --arch large.
`base` is a model for general voice separation task, and `large` is a model for singing voice separation task.  
```bash
$ python3 unet_source_separation.py --input WAV_PATH --savepath SAVE_WAV_PATH --arch base
```


### Reference

[source_separation](https://github.com/AppleHolic/source_separation)  

[Singing Voice Separation Samples](https://www.youtube.com/playlist?list=PLQ4ukFz6Ieir5bZYOns08_2gMjt4hYP4I)

### Framework

PyTorch 1.6.0

### Model Format

ONNX opset = 11

### Netron
- General voice separation

[second_voice_bank.best.opt.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/unet_source_separation/second_voice_bank.best.opt.onnx.prototxt)

- Singing voice separation

[RefineSpectrogramUnet.best.opt.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/unet_source_separation/RefineSpectrogramUnet.best.opt.onnx.prototxt)