File size: 2,680 Bytes
72db363 29cf514 d431f97 72db363 29cf514 72db363 29cf514 72db363 29cf514 72db363 29cf514 72db363 29cf514 72db363 29cf514 72db363 29cf514 21bbec1 29cf514 72db363 29cf514 72db363 29cf514 72db363 29cf514 72db363 29cf514 72db363 29cf514 d431f97 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
library_name: transformers
license: mit
datasets:
- CLAPv2/MUSDB18-HQ
pipeline_tag: audio-to-audio
tags:
- music
new_version: HiDolen/Mini-BS-RoFormer-18M
---
# Model Card for Model ID
Model for the Music source separation task. Its implementation is referenced to [the existing BS-RoFormer code](https://github.com/lucidrains/BS-RoFormer).
针对音乐音频分离任务的模型。改编自 [现有的 BS-RoFormer 模型代码](https://github.com/lucidrains/BS-RoFormer)。
## Model Details
模型参数:
- depth = 3
- hidden_size = 256
- intermediate_size = 256 * 2
总参数量只有 8.8M,在 MUSDB18HQ 数据的 val 集上达到平均 SDR 6.5 的性能。分轨具体 SDR:
- bass,5.66
- drums,6.77
- other,6.06
- vocal,7.44
## Uses
使用的 transformers 库版本为 4.55.4。为了正常运行模型还需要安装库 soudfile 和 einops。
CPU 推理:
```python
from transformers import AutoModel
import soundfile
import torch
model_name = "HiDolen/Mini-BS-RoFormer"
model = AutoModel.from_pretrained(
model_name,
trust_remote_code=True,
)
# 加载音频
file = "./Bruno Mars - Runaway Baby.mp3"
waveform, sr = soundfile.read(file)
assert sr == 44100 # 采样率必须为 44100Hz
waveform = torch.tensor(waveform).T.float()
# 进行推理
result = model.separate(
waveform,
chunk_size=44100 * 6,
overlap_size=44100 * 3,
gap_size=0,
batch_size=16,
verbose=True,
)
# 保存处理结果
for i in range(result.shape[0]):
soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100)
```
GPU 推理:
```python
from transformers import AutoModel
import soundfile
import torch
model_name = "HiDolen/Mini-BS-RoFormer"
model = AutoModel.from_pretrained(
model_name,
trust_remote_code=True,
)
model.to("cuda")
# 加载音频
file = "./Bruno Mars - Runaway Baby.mp3"
waveform, sr = soundfile.read(file)
assert sr == 44100 # 采样率必须为 44100Hz
waveform = torch.tensor(waveform).T.float()
waveform = waveform.to("cuda")
# 进行推理
result = model.separate(
waveform,
chunk_size=44100 * 6,
overlap_size=44100 * 3,
gap_size=0,
batch_size=16,
verbose=True,
)
# 保存处理结果
for i in range(result.shape[0]):
soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100)
```
## Training Details
使用 MUSDB18HQ 数据进行训练。
不使用原论文中提到的 Multi-STFT 损失项以提高训练速度。
学习率 5e-4,以 batch_size=6 训练 60k 步。
## Acknowledgments
- https://github.com/lucidrains/BS-RoFormer
- https://arxiv.org/abs/2309.02612 (Music Source Separation with Band-Split RoPE Transformer) |