U-Net (models_onnx)

Browse files

Files changed (14) hide show

.gitattributes +2 -0
models/ailia-models/RefineSpectrogramUnet.best.opt.onnx +3 -0
models/ailia-models/RefineSpectrogramUnet.best.opt.onnx.prototxt +0 -0
models/ailia-models/code/049 - Young Griffo - Facade.wav +3 -0
models/ailia-models/code/LICENSE +25 -0
models/ailia-models/code/README.md +71 -0
models/ailia-models/code/doublenoble_k7rain_part.wav +3 -0
models/ailia-models/code/requirements.txt +3 -0
models/ailia-models/code/unet_source_separation.py +200 -0
models/ailia-models/code/unet_source_separation_utils.py +56 -0
models/ailia-models/code/unet_source_separation_utils_ailia.py +58 -0
models/ailia-models/second_voice_bank.best.opt.onnx +3 -0
models/ailia-models/second_voice_bank.best.opt.onnx.prototxt +0 -0
models/ailia-models/source.txt +7 -0

.gitattributes CHANGED Viewed

@@ -35,3 +35,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 Funnel[[:space:]]Deep[[:space:]]Complex[[:space:]]U-Net[[:space:]]for[[:space:]]Phase-Aware[[:space:]]Speech[[:space:]]Enhancement.pdf filter=lfs diff=lfs merge=lfs -text
 Phase-aware[[:space:]]Speech[[:space:]]Enhancement[[:space:]]with[[:space:]]Deep[[:space:]]Complex[[:space:]]U-Net.pdf filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 Funnel[[:space:]]Deep[[:space:]]Complex[[:space:]]U-Net[[:space:]]for[[:space:]]Phase-Aware[[:space:]]Speech[[:space:]]Enhancement.pdf filter=lfs diff=lfs merge=lfs -text
 Phase-aware[[:space:]]Speech[[:space:]]Enhancement[[:space:]]with[[:space:]]Deep[[:space:]]Complex[[:space:]]U-Net.pdf filter=lfs diff=lfs merge=lfs -text
+models/ailia-models/code/049[[:space:]]-[[:space:]]Young[[:space:]]Griffo[[:space:]]-[[:space:]]Facade.wav filter=lfs diff=lfs merge=lfs -text
+models/ailia-models/code/doublenoble_k7rain_part.wav filter=lfs diff=lfs merge=lfs -text

models/ailia-models/RefineSpectrogramUnet.best.opt.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ab1e0af3c22250f626379ee4a367687a28547fe3aa186b6a614e1b9dee3b3da
+size 381668080

models/ailia-models/RefineSpectrogramUnet.best.opt.onnx.prototxt ADDED Viewed

The diff for this file is too large to render. See raw diff

models/ailia-models/code/049 - Young Griffo - Facade.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9701789b5d5d6dd89d82cbc146fc70edc127cbec5176be7816079cd06225c91
+size 6867808

models/ailia-models/code/LICENSE ADDED Viewed

	@@ -0,0 +1,25 @@

+BSD 2-Clause License
+Copyright (c) 2019, ILJI CHOI
+All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

models/ailia-models/code/README.md ADDED Viewed

	@@ -0,0 +1,71 @@

+# source_separation
+### input
+- Noisy speech (audio file)
+```
+Audio from creative commons youtube videos
+https://drive.google.com/drive/folders/19Sn6pe5-BtWXYa6OiLbYGH7iCU-mzB8j
+doublenoble_k7rain_part.wav
+(Original video : https://www.youtube.com/watch?v=vsjB1xTwZ20&t=536s)
+```
+- Music (audio file)
+```
+DSD100 dataset
+https://sigsep.github.io/datasets/dsd100.html
+049 - Young Griffo - Facade.wav
+```
+### output
+Separated voice (audio file)
+```
+separated_voice.wav
+```
+### Usage
+Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.
+For the sample audio file,
+```bash
+$ python3 unet_source_separation.py
+```
+If you want to specify the input audio file, put the input path after the --input option.
+You can use --savepath option to change the name of the output file to save.
+```bash
+$ python3 unet_source_separation.py --input WAV_PATH --savepath SAVE_WAV_PATH
+```
+You can select a pretrained model by specifying --arch base (default) or --arch large.
+`base` is a model for general voice separation task, and `large` is a model for singing voice separation task.
+```bash
+$ python3 unet_source_separation.py --input WAV_PATH --savepath SAVE_WAV_PATH --arch base
+```
+### Reference
+[source_separation](https://github.com/AppleHolic/source_separation)
+[Singing Voice Separation Samples](https://www.youtube.com/playlist?list=PLQ4ukFz6Ieir5bZYOns08_2gMjt4hYP4I)
+### Framework
+PyTorch 1.6.0
+### Model Format
+ONNX opset = 11
+### Netron
+- General voice separation
+[second_voice_bank.best.opt.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/unet_source_separation/second_voice_bank.best.opt.onnx.prototxt)
+- Singing voice separation
+[RefineSpectrogramUnet.best.opt.onnx.prototxt](https://netron.app/?url=https://storage.googleapis.com/ailia-models/unet_source_separation/RefineSpectrogramUnet.best.opt.onnx.prototxt)

models/ailia-models/code/doublenoble_k7rain_part.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ac658a1284aa28ea7c77f5126691ef02696fa1bfa41a0a5b41cd9906260bf8dd
+size 11518066

models/ailia-models/code/requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+numpy==1.22.0
+soundfile==0.10.3.post1
+scipy==1.10.0

models/ailia-models/code/unet_source_separation.py ADDED Viewed

	@@ -0,0 +1,200 @@

+import time
+import sys
+import argparse
+import numpy as np
+import ailia  # noqa: E402
+import soundfile as sf
+# import original modules
+sys.path.append('../../util')
+from arg_utils import get_base_parser, update_parser, get_savepath  # noqa: E402
+from model_utils import check_and_download_models  # noqa: E402
+# logger
+from logging import getLogger   # noqa: E402
+logger = getLogger(__name__)
+# ======================
+# Parameters 1
+# ======================
+WAV_PATH = 'doublenoble_k7rain_part.wav' # noisy speech sample
+#WAVE_PATH = '049 - Young Griffo - Facade.wav' # music sample
+SAVE_WAV_PATH = 'separated_voice.wav'
+MODEL_LISTS = ['base', 'large']
+# ======================
+# Arguemnt Parser Config
+# ======================
+parser = get_base_parser(
+    'RSource separation.',
+    WAV_PATH,
+    SAVE_WAV_PATH,
+)
+parser.add_argument(
+    '-n', '--onnx',
+    action='store_true',
+    default=False,
+    help='Use onnxruntime'
+)
+parser.add_argument(
+    '-st', '--stereo',
+    action='store_true',
+    default=False,
+    help='Use stereo mode'
+)
+parser.add_argument(
+    '-a', '--arch',
+    default='base', choices=MODEL_LISTS,
+    help='model lists: ' + ' | '.join(MODEL_LISTS)
+)
+parser.add_argument(
+    '--ailia_audio', action='store_true',
+    help='use ailia audio library'
+)
+args = update_parser(parser)
+if args.ailia_audio:
+    import ailia.audio as ailia_audio
+    from unet_source_separation_utils_ailia import preemphasis, inv_preemphasis, lowpass, tfconvert, zero_pad, calc_time  # noqa: E402
+else:
+    from scipy import signal
+    from unet_source_separation_utils import preemphasis, inv_preemphasis, lowpass, tfconvert, zero_pad, calc_time  # noqa: E402
+# ======================
+# Parameters 2
+# ======================
+if args.arch == 'base' : # for general voice separation
+    WEIGHT_PATH = "second_voice_bank.best.opt2.onnx"
+else :  # for singing voice separation
+    WEIGHT_PATH = "RefineSpectrogramUnet.best.opt.onnx"
+MODEL_PATH = WEIGHT_PATH + ".prototxt"
+REMOTE_PATH = "https://storage.googleapis.com/ailia-models/unet_source_separation/"
+# fixed parameters for each model
+if args.arch == 'base' :
+    DESIRED_SR = 22050
+    MULT = 2 ** 5
+    WINDOW_LEN = 512
+    HOP_LEN = 64
+else :
+    DESIRED_SR = 44100
+    MULT = 2 ** 6
+    WINDOW_LEN = 1024
+    HOP_LEN = 128
+# adjustable parameters
+if args.arch == 'base' :
+    LPF_CUTOFF = 10000
+else :
+    LPF_CUTOFF = 20000
+# ======================
+# Main function
+# ======================
+def src_sep(data, session) :
+    # inference
+    if not args.onnx :
+        sep = session.run(data)[0]
+    else :
+        first_input_name = session.get_inputs()[0].name
+        second_input_name = session.get_inputs()[1].name
+        first_output_name = session.get_outputs()[0].name
+        sep = session.run(
+            [first_output_name],
+            {first_input_name: data[0], second_input_name: data[1]})[0]
+    return sep
+def recognize_one_audio(input_path):
+    # load audio
+    logger.info('Loading wavfile...')
+    wav, sr = sf.read(input_path)
+    if wav.dtype != np.float32:
+        wav = wav.astype(np.float32)
+    if wav.ndim == 2 :
+        if args.stereo:
+            wav = np.transpose(wav,(1,0))   # stereo to batch
+        else:
+            wav = (wav[:,0][np.newaxis,:] + wav[:,1][np.newaxis,:])/2   # convert to mono
+    else:
+        wav = wav[np.newaxis,:]
+    calc_time(wav.shape[1], sr)
+    # convert sample rate
+    logger.info('Converting sample rate...')
+    if not sr == DESIRED_SR :
+        if args.ailia_audio:
+            wav = ailia.audio.resample(wav,sr,DESIRED_SR)
+        else:
+            wav = signal.resample_poly(wav, DESIRED_SR, sr, axis=1)
+    # apply preenphasis filter
+    logger.info('Generating input feature...')
+    wav = preemphasis(wav)
+    input_feature = tfconvert(wav, WINDOW_LEN, HOP_LEN, MULT)
+    # create instance
+    if not args.onnx :
+        logger.info('Use ailia')
+        env_id = args.env_id
+        logger.info(f'env_id: {env_id}')
+        memory_mode = ailia.get_memory_mode(reuse_interstage=True)
+        session = ailia.Net(MODEL_PATH, WEIGHT_PATH, env_id=env_id, memory_mode=memory_mode)
+    else :
+        logger.info('Use onnxruntime')
+        import onnxruntime
+        session = onnxruntime.InferenceSession(WEIGHT_PATH)
+    # inference
+    logger.info('Start inference...')
+    if args.benchmark:
+        logger.info('BENCHMARK mode')
+        for c in range(5) :
+            start = int(round(time.time() * 1000))
+            sep = src_sep(input_feature, session)
+            end = int(round(time.time() * 1000))
+            logger.info("\tprocessing time {} ms".format(end-start))
+    else:
+        sep = src_sep(input_feature, session)
+    # postprocessing
+    logger.info('Start postprocessing...')
+    if LPF_CUTOFF > 0 :
+        sep = lowpass(sep, LPF_CUTOFF, DESIRED_SR)
+    out_wav = inv_preemphasis(sep).clip(-1.,1.)
+    out_wav = out_wav.swapaxes(0,1)
+    # save sapareted signal
+    savepath = get_savepath(args.savepath, input_path)
+    logger.info(f'saved at : {savepath}')
+    sf.write(savepath, out_wav, DESIRED_SR)
+    logger.info('Saved separated signal. ')
+    logger.info('Script finished successfully.')
+def main():
+    # model files check and download
+    check_and_download_models(WEIGHT_PATH, MODEL_PATH, REMOTE_PATH)
+    for input_file in args.input:
+        recognize_one_audio(input_file)
+if __name__ == "__main__":
+     main()

models/ailia-models/code/unet_source_separation_utils.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import sys
+import numpy as np
+from scipy import signal
+# ======================
+# Pre/Post process
+# ======================
+def preemphasis(data, coeff=0.97):
+    return signal.lfilter([1,-coeff], [1], data).astype(np.float32)
+def inv_preemphasis(data, coeff=0.97):
+    return signal.lfilter([1], [1,-coeff], data).astype(np.float32)
+def lowpass(data, stop_freq, sample_freq, N=4):
+    wn = 2.0 * stop_freq / sample_freq
+    b, a = signal.butter(N, wn, btype="low")
+    data = signal.filtfilt(b,a, data)
+    return data
+def tfconvert(x, window_len, hop_len, mult, window='hann') :
+    noverlap = window_len - hop_len
+    _, _, y = signal.stft(x, window=window, nperseg=window_len, noverlap=noverlap)
+    y_re = np.real(y) * (window_len//2 + 1)
+    y_im = np.imag(y) * (window_len//2 + 1)
+    y_mag = np.log(np.sqrt(y_re ** 2 + y_im ** 2)+1.0).astype(np.float32)
+    y_phase = np.arctan2(y_im, y_re).astype(np.float32)
+    y_mag = zero_pad(y_mag, mult)
+    y_phase = zero_pad(y_phase, mult)
+    return y_mag, y_phase
+def zero_pad(x, mult) :
+    mod = x.shape[2] % mult
+    if mod > 0 :
+        pad = mult - mod
+        x = np.concatenate(( x, np.zeros((x.shape[0], x.shape[1], pad), dtype=np.float32) ), axis=2)
+    return x
+def calc_time(sample_len ,sr) :
+    quot = sample_len // sr
+    rem = (sample_len % sr) / sr
+    min = quot // 60
+    sec = quot % 60 + rem
+    print('Time length : {}min {:.02f}sec'.format(min,sec))

models/ailia-models/code/unet_source_separation_utils_ailia.py ADDED Viewed

	@@ -0,0 +1,58 @@

+import sys
+import numpy as np
+import ailia.audio as ailia_audio
+# ======================
+# Pre/Post process
+# ======================
+def preemphasis(data, coeff=0.97):
+    return ailia_audio.linerfilter(np.array([1,-coeff]), np.array([1]), data).astype(np.float32)
+def inv_preemphasis(data, coeff=0.97):
+    return ailia_audio.linerfilter(np.array([1]), np.array([1,-coeff]), data).astype(np.float32)
+def lowpass(data, stop_freq, sample_freq, N=4):
+    if( stop_freq == 10000 and sample_freq == 22050) or ( stop_freq == 20000 and sample_freq == 44100):
+        b = np.array([0.68166451, 2.72665802, 4.08998703, 2.72665802, 0.68166451])
+        a = np.array([1.        , 3.238043,   3.99120175, 2.21272074, 0.4646666 ])
+    else:
+        raise ValueError('illegal sample freqency.')
+    data = ailia_audio.filterfilter(b,a, data)
+    return data
+def tfconvert(x, window_len, hop_len, mult, window='hann') :
+    y = ailia_audio.spectrogram(x, fft_n=window_len, hop_n=hop_len, center_mode=2, norm_type="scipy",win_type=window)
+    y_re = np.real(y) * (window_len//2 + 1)
+    y_im = np.imag(y) * (window_len//2 + 1)
+    y_mag = np.log(np.sqrt(y_re ** 2 + y_im ** 2)+1.0).astype(np.float32)
+    y_phase = np.arctan2(y_im, y_re).astype(np.float32)
+    y_mag = zero_pad(y_mag, mult)
+    y_phase = zero_pad(y_phase, mult)
+    return y_mag, y_phase
+def zero_pad(x, mult) :
+    mod = x.shape[2] % mult
+    if mod > 0 :
+        pad = mult - mod
+        x = np.concatenate(( x, np.zeros((x.shape[0], x.shape[1], pad), dtype=np.float32) ), axis=2)
+    return x
+def calc_time(sample_len ,sr) :
+    quot = sample_len // sr
+    rem = (sample_len % sr) / sr
+    min = quot // 60
+    sec = quot % 60 + rem
+    print('Time length : {}min {:.02f}sec'.format(min,sec))

models/ailia-models/second_voice_bank.best.opt.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1abf4fe6881666fb9466d20f1887f7035f95687a05c06ba0c23f9898e3424241
+size 301236944

models/ailia-models/second_voice_bank.best.opt.onnx.prototxt ADDED Viewed

The diff for this file is too large to render. See raw diff

models/ailia-models/source.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+https://github.com/axinc-ai/ailia-models/tree/master/audio_processing/unet_source_separation
+https://storage.googleapis.com/ailia-models/unet_source_separation/second_voice_bank.best.opt.onnx
+https://storage.googleapis.com/ailia-models/unet_source_separation/second_voice_bank.best.opt.onnx.prototxt
+https://storage.googleapis.com/ailia-models/unet_source_separation/RefineSpectrogramUnet.best.opt.onnx
+https://storage.googleapis.com/ailia-models/unet_source_separation/RefineSpectrogramUnet.best.opt.onnx.prototxt