File size: 2,680 Bytes
72db363
 
29cf514
 
 
 
 
 
d431f97
72db363
 
 
 
29cf514
72db363
29cf514
72db363
 
 
 
29cf514
72db363
29cf514
 
 
72db363
29cf514
72db363
29cf514
 
 
 
72db363
 
 
29cf514
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21bbec1
29cf514
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72db363
 
 
29cf514
72db363
29cf514
72db363
29cf514
72db363
29cf514
72db363
29cf514
d431f97
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
library_name: transformers
license: mit
datasets:
- CLAPv2/MUSDB18-HQ
pipeline_tag: audio-to-audio
tags:
- music
new_version: HiDolen/Mini-BS-RoFormer-18M
---

# Model Card for Model ID

Model for the Music source separation task. Its implementation is referenced to [the existing BS-RoFormer code](https://github.com/lucidrains/BS-RoFormer).

针对音乐音频分离任务的模型。改编自 [现有的 BS-RoFormer 模型代码](https://github.com/lucidrains/BS-RoFormer)。


## Model Details

模型参数:

- depth = 3
- hidden_size = 256
- intermediate_size = 256 * 2

总参数量只有 8.8M,在 MUSDB18HQ 数据的 val 集上达到平均 SDR 6.5 的性能。分轨具体 SDR:

- bass,5.66
- drums,6.77
- other,6.06
- vocal,7.44

## Uses

使用的 transformers 库版本为 4.55.4。为了正常运行模型还需要安装库 soudfile 和 einops。

CPU 推理:

```python
from transformers import AutoModel
import soundfile
import torch

model_name = "HiDolen/Mini-BS-RoFormer"
model = AutoModel.from_pretrained(
    model_name,
    trust_remote_code=True,
)

# 加载音频
file = "./Bruno Mars - Runaway Baby.mp3"
waveform, sr = soundfile.read(file)
assert sr == 44100  # 采样率必须为 44100Hz
waveform = torch.tensor(waveform).T.float()

# 进行推理
result = model.separate(
    waveform,
    chunk_size=44100 * 6,
    overlap_size=44100 * 3,
    gap_size=0,
    batch_size=16,
    verbose=True,
)

# 保存处理结果
for i in range(result.shape[0]):
    soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100)
```

GPU 推理:

```python
from transformers import AutoModel
import soundfile
import torch

model_name = "HiDolen/Mini-BS-RoFormer"
model = AutoModel.from_pretrained(
    model_name,
    trust_remote_code=True,
)
model.to("cuda")

# 加载音频
file = "./Bruno Mars - Runaway Baby.mp3"
waveform, sr = soundfile.read(file)
assert sr == 44100  # 采样率必须为 44100Hz
waveform = torch.tensor(waveform).T.float()
waveform = waveform.to("cuda")

# 进行推理
result = model.separate(
    waveform,
    chunk_size=44100 * 6,
    overlap_size=44100 * 3,
    gap_size=0,
    batch_size=16,
    verbose=True,
)

# 保存处理结果
for i in range(result.shape[0]):
    soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100)
```

## Training Details

使用 MUSDB18HQ 数据进行训练。

不使用原论文中提到的 Multi-STFT 损失项以提高训练速度。

学习率 5e-4,以 batch_size=6 训练 60k 步。

## Acknowledgments

- https://github.com/lucidrains/BS-RoFormer
- https://arxiv.org/abs/2309.02612 (Music Source Separation with Band-Split RoPE Transformer)