File size: 2,680 Bytes

---
library_name: transformers
license: mit
datasets:
- CLAPv2/MUSDB18-HQ
pipeline_tag: audio-to-audio
tags:
- music
new_version: HiDolen/Mini-BS-RoFormer-18M
---

# Model Card for Model ID

Model for the Music source separation task. Its implementation is referenced to [the existing BS-RoFormer code](https://github.com/lucidrains/BS-RoFormer).

针对音乐音频分离任务的模型。改编自 [现有的 BS-RoFormer 模型代码](https://github.com/lucidrains/BS-RoFormer)。


## Model Details

模型参数：

- depth = 3
- hidden_size = 256
- intermediate_size = 256 * 2

总参数量只有 8.8M，在 MUSDB18HQ 数据的 val 集上达到平均 SDR 6.5 的性能。分轨具体 SDR：

- bass，5.66
- drums，6.77
- other，6.06
- vocal，7.44

## Uses

使用的 transformers 库版本为 4.55.4。为了正常运行模型还需要安装库 soudfile 和 einops。

CPU 推理：

```python
from transformers import AutoModel
import soundfile
import torch

model_name = "HiDolen/Mini-BS-RoFormer"
model = AutoModel.from_pretrained(
    model_name,
    trust_remote_code=True,
)

# 加载音频
file = "./Bruno Mars - Runaway Baby.mp3"
waveform, sr = soundfile.read(file)
assert sr == 44100  # 采样率必须为 44100Hz
waveform = torch.tensor(waveform).T.float()

# 进行推理
result = model.separate(
    waveform,
    chunk_size=44100 * 6,
    overlap_size=44100 * 3,
    gap_size=0,
    batch_size=16,
    verbose=True,
)

# 保存处理结果
for i in range(result.shape[0]):
    soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100)
```

GPU 推理：

```python
from transformers import AutoModel
import soundfile
import torch

model_name = "HiDolen/Mini-BS-RoFormer"
model = AutoModel.from_pretrained(
    model_name,
    trust_remote_code=True,
)
model.to("cuda")

# 加载音频
file = "./Bruno Mars - Runaway Baby.mp3"
waveform, sr = soundfile.read(file)
assert sr == 44100  # 采样率必须为 44100Hz
waveform = torch.tensor(waveform).T.float()
waveform = waveform.to("cuda")

# 进行推理
result = model.separate(
    waveform,
    chunk_size=44100 * 6,
    overlap_size=44100 * 3,
    gap_size=0,
    batch_size=16,
    verbose=True,
)

# 保存处理结果
for i in range(result.shape[0]):
    soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100)
```

## Training Details

使用 MUSDB18HQ 数据进行训练。

不使用原论文中提到的 Multi-STFT 损失项以提高训练速度。

学习率 5e-4，以 batch_size=6 训练 60k 步。

## Acknowledgments

- https://github.com/lucidrains/BS-RoFormer
- https://arxiv.org/abs/2309.02612 (Music Source Separation with Band-Split RoPE Transformer)