oulianov commited on
Commit
e5583b6
·
verified ·
1 Parent(s): cd90e62

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ ---
4
+ Apollo Offical GitHub:https://github.com/JusperLee/Apollo
5
+
6
+ Apollo is a novel music restoration method designed to address distortions and artefacts caused by audio codecs, especially at low bitrates. Operating in the frequency domain, Apollo uses a frequency band-split module, band-sequence modeling, and frequency band reconstruction to restore the audio quality of MP3-compressed music. It divides the spectrogram into sub-bands, extracts gain-shape representations, and models both sub-band and temporal information for high-quality audio recovery. Trained with a Generative Adversarial Network (GAN), Apollo outperforms existing SR-GAN models on the MUSDB18-HQ and MoisesDB datasets, excelling in complex multi-instrument and vocal scenarios, while maintaining efficiency.
7
+
8
+ The open-sourced content includes models for inference at https://github.com/ZFTurbo/Music-Source-Separation-Training and the original weights with fewer training steps. The training was conducted using sucial's project at https://github.com/SUC-DriverOld/Apollo-Training, with a 92-hour high-quality vocal dataset trained for 1 million steps.
9
+
10
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65ef5331b46c5c72e374a3dd/uRJGmwdu--qhKlkMy5HO6.png)
config.yaml ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ exp:
2
+ dir: ./Exps
3
+ name: ApolloVoice
4
+ datas:
5
+ _target_: look2hear.datas.MusdbMoisesdbDataModule
6
+ train_dir: ./dataset/restoration/train
7
+ eval_dir: ./dataset/restoration/test
8
+ codec_type: mp3
9
+ codec_options:
10
+ bitrate: random
11
+ compression: random
12
+ complexity: random
13
+ vbr: random
14
+ sr: 44100
15
+ segments: 5.4
16
+ num_stems: 8
17
+ snr_range:
18
+ - -10
19
+ - 10
20
+ num_samples: 3000
21
+ batch_size: 1
22
+ num_workers: 8
23
+ model:
24
+ _target_: look2hear.models.apollo.Apollo
25
+ sr: 44100
26
+ win: 20
27
+ feature_dim: 384
28
+ layer: 8
29
+ discriminator:
30
+ _target_: look2hear.discriminators.frequencydis.MultiFrequencyDiscriminator
31
+ nch: 2
32
+ window:
33
+ - 32
34
+ - 64
35
+ - 128
36
+ - 256
37
+ - 512
38
+ - 1024
39
+ - 2048
40
+ optimizer_g:
41
+ _target_: bitsandbytes.optim.AdamW8bit
42
+ lr: 0.001
43
+ weight_decay: 0.01
44
+ optimizer_d:
45
+ _target_: bitsandbytes.optim.AdamW8bit
46
+ lr: 0.0001
47
+ weight_decay: 0.01
48
+ betas:
49
+ - 0.5
50
+ - 0.99
51
+ scheduler_g:
52
+ _target_: torch.optim.lr_scheduler.StepLR
53
+ step_size: 4
54
+ gamma: 0.98
55
+ scheduler_d:
56
+ _target_: torch.optim.lr_scheduler.StepLR
57
+ step_size: 4
58
+ gamma: 0.98
59
+ loss_g:
60
+ _target_: look2hear.losses.gan_losses.MultiFrequencyGenLoss
61
+ eps: 2.0e-08
62
+ loss_d:
63
+ _target_: look2hear.losses.gan_losses.MultiFrequencyDisLoss
64
+ eps: 2.0e-08
65
+ metrics:
66
+ _target_: look2hear.losses.MultiSrcNegSDR
67
+ sdr_type: sisdr
68
+ system:
69
+ _target_: look2hear.system.audio_litmodule.AudioLightningModule
70
+ early_stopping:
71
+ _target_: pytorch_lightning.callbacks.EarlyStopping
72
+ monitor: val_loss
73
+ patience: 50
74
+ mode: min
75
+ verbose: true
76
+ checkpoint:
77
+ _target_: pytorch_lightning.callbacks.ModelCheckpoint
78
+ dirpath: ${exp.dir}/${exp.name}/checkpoints
79
+ monitor: val_loss
80
+ mode: min
81
+ verbose: true
82
+ save_top_k: 5
83
+ save_last: true
84
+ filename: "{epoch}-{val_loss:.4f}"
85
+ logger:
86
+ _target_: pytorch_lightning.loggers.WandbLogger
87
+ name: ${exp.name}
88
+ save_dir: ${exp.dir}/${exp.name}/logs
89
+ offline: true
90
+ project: Audio-Restoration
91
+ trainer:
92
+ _target_: pytorch_lightning.Trainer
93
+ devices:
94
+ - 0
95
+ max_epochs: 500
96
+ sync_batchnorm: true
97
+ default_root_dir: ${exp.dir}/${exp.name}/
98
+ accelerator: cuda
99
+ limit_train_batches: 1.0
100
+ fast_dev_run: false
config_apollo_vocals_ep_54.yaml ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 441000
3
+ min_mean_abs: 0.0
4
+ num_channels: 2
5
+ sample_rate: 44100
6
+ augmentations:
7
+ enable: false
8
+ inference:
9
+ batch_size: 1
10
+ num_overlap: 4
11
+ model:
12
+ feature_dim: 384
13
+ layer: 8
14
+ sr: 44100
15
+ win: 20
16
+ training:
17
+ batch_size: 1
18
+ coarse_loss_clip: true
19
+ grad_clip: 0
20
+ instruments:
21
+ - restored
22
+ - addition
23
+ lr: 1.0
24
+ num_epochs: 1000
25
+ num_steps: 1000
26
+ optimizer: prodigy
27
+ patience: 2
28
+ q: 0.95
29
+ reduce_factor: 0.95
30
+ target_instrument: restored
31
+ use_amp: true
epoch=54-val_loss=-17.6221.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b7158aac7d9fb886b986ea62beb4c050526b3951f8d9c11bacba1984a6b5c74f
3
+ size 615867827
model_apollo_vocals_ep_54.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59e1311f93e1f0fde6d5d11fa69d97e41cdee39be38f0cf4ccb80cfce34b2a2b
3
+ size 194278526