|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
datasets: |
|
|
- CLAPv2/MUSDB18-HQ |
|
|
pipeline_tag: audio-to-audio |
|
|
tags: |
|
|
- music |
|
|
new_version: HiDolen/Mini-BS-RoFormer-18M |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
Model for the Music source separation task. Its implementation is referenced to [the existing BS-RoFormer code](https://github.com/lucidrains/BS-RoFormer). |
|
|
|
|
|
针对音乐音频分离任务的模型。改编自 [现有的 BS-RoFormer 模型代码](https://github.com/lucidrains/BS-RoFormer)。 |
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
模型参数: |
|
|
|
|
|
- depth = 3 |
|
|
- hidden_size = 256 |
|
|
- intermediate_size = 256 * 2 |
|
|
|
|
|
总参数量只有 8.8M,在 MUSDB18HQ 数据的 val 集上达到平均 SDR 6.5 的性能。分轨具体 SDR: |
|
|
|
|
|
- bass,5.66 |
|
|
- drums,6.77 |
|
|
- other,6.06 |
|
|
- vocal,7.44 |
|
|
|
|
|
## Uses |
|
|
|
|
|
使用的 transformers 库版本为 4.55.4。为了正常运行模型还需要安装库 soudfile 和 einops。 |
|
|
|
|
|
CPU 推理: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel |
|
|
import soundfile |
|
|
import torch |
|
|
|
|
|
model_name = "HiDolen/Mini-BS-RoFormer" |
|
|
model = AutoModel.from_pretrained( |
|
|
model_name, |
|
|
trust_remote_code=True, |
|
|
) |
|
|
|
|
|
# 加载音频 |
|
|
file = "./Bruno Mars - Runaway Baby.mp3" |
|
|
waveform, sr = soundfile.read(file) |
|
|
assert sr == 44100 # 采样率必须为 44100Hz |
|
|
waveform = torch.tensor(waveform).T.float() |
|
|
|
|
|
# 进行推理 |
|
|
result = model.separate( |
|
|
waveform, |
|
|
chunk_size=44100 * 6, |
|
|
overlap_size=44100 * 3, |
|
|
gap_size=0, |
|
|
batch_size=16, |
|
|
verbose=True, |
|
|
) |
|
|
|
|
|
# 保存处理结果 |
|
|
for i in range(result.shape[0]): |
|
|
soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100) |
|
|
``` |
|
|
|
|
|
GPU 推理: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel |
|
|
import soundfile |
|
|
import torch |
|
|
|
|
|
model_name = "HiDolen/Mini-BS-RoFormer" |
|
|
model = AutoModel.from_pretrained( |
|
|
model_name, |
|
|
trust_remote_code=True, |
|
|
) |
|
|
model.to("cuda") |
|
|
|
|
|
# 加载音频 |
|
|
file = "./Bruno Mars - Runaway Baby.mp3" |
|
|
waveform, sr = soundfile.read(file) |
|
|
assert sr == 44100 # 采样率必须为 44100Hz |
|
|
waveform = torch.tensor(waveform).T.float() |
|
|
waveform = waveform.to("cuda") |
|
|
|
|
|
# 进行推理 |
|
|
result = model.separate( |
|
|
waveform, |
|
|
chunk_size=44100 * 6, |
|
|
overlap_size=44100 * 3, |
|
|
gap_size=0, |
|
|
batch_size=16, |
|
|
verbose=True, |
|
|
) |
|
|
|
|
|
# 保存处理结果 |
|
|
for i in range(result.shape[0]): |
|
|
soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
使用 MUSDB18HQ 数据进行训练。 |
|
|
|
|
|
不使用原论文中提到的 Multi-STFT 损失项以提高训练速度。 |
|
|
|
|
|
学习率 5e-4,以 batch_size=6 训练 60k 步。 |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- https://github.com/lucidrains/BS-RoFormer |
|
|
- https://arxiv.org/abs/2309.02612 (Music Source Separation with Band-Split RoPE Transformer) |