HiDolen
/

Mini-BS-RoFormer

feature-extraction

Model card Files Files and versions

Mini-BS-RoFormer / README.md

HiDolen's picture

Update README.md

d431f97 verified 3 months ago

|

history blame contribute delete

2.68 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- CLAPv2/MUSDB18-HQ
	pipeline_tag: audio-to-audio
	tags:
	- music
	new_version: HiDolen/Mini-BS-RoFormer-18M
	---

	# Model Card for Model ID

	Model for the Music source separation task. Its implementation is referenced to [the existing BS-RoFormer code](https://github.com/lucidrains/BS-RoFormer).

	针对音乐音频分离任务的模型。改编自 [现有的 BS-RoFormer 模型代码](https://github.com/lucidrains/BS-RoFormer)。


	## Model Details

	模型参数：

	- depth = 3
	- hidden_size = 256
	- intermediate_size = 256 * 2

	总参数量只有 8.8M，在 MUSDB18HQ 数据的 val 集上达到平均 SDR 6.5 的性能。分轨具体 SDR：

	- bass，5.66
	- drums，6.77
	- other，6.06
	- vocal，7.44

	## Uses

	使用的 transformers 库版本为 4.55.4。为了正常运行模型还需要安装库 soudfile 和 einops。

	CPU 推理：

	```python
	from transformers import AutoModel
	import soundfile
	import torch

	model_name = "HiDolen/Mini-BS-RoFormer"
	model = AutoModel.from_pretrained(
	model_name,
	trust_remote_code=True,
	)

	# 加载音频
	file = "./Bruno Mars - Runaway Baby.mp3"
	waveform, sr = soundfile.read(file)
	assert sr == 44100 # 采样率必须为 44100Hz
	waveform = torch.tensor(waveform).T.float()

	# 进行推理
	result = model.separate(
	waveform,
	chunk_size=44100 * 6,
	overlap_size=44100 * 3,
	gap_size=0,
	batch_size=16,
	verbose=True,
	)

	# 保存处理结果
	for i in range(result.shape[0]):
	soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100)
	```

	GPU 推理：

	```python
	from transformers import AutoModel
	import soundfile
	import torch

	model_name = "HiDolen/Mini-BS-RoFormer"
	model = AutoModel.from_pretrained(
	model_name,
	trust_remote_code=True,
	)
	model.to("cuda")

	# 加载音频
	file = "./Bruno Mars - Runaway Baby.mp3"
	waveform, sr = soundfile.read(file)
	assert sr == 44100 # 采样率必须为 44100Hz
	waveform = torch.tensor(waveform).T.float()
	waveform = waveform.to("cuda")

	# 进行推理
	result = model.separate(
	waveform,
	chunk_size=44100 * 6,
	overlap_size=44100 * 3,
	gap_size=0,
	batch_size=16,
	verbose=True,
	)

	# 保存处理结果
	for i in range(result.shape[0]):
	soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100)
	```

	## Training Details

	使用 MUSDB18HQ 数据进行训练。

	不使用原论文中提到的 Multi-STFT 损失项以提高训练速度。

	学习率 5e-4，以 batch_size=6 训练 60k 步。

	## Acknowledgments

	- https://github.com/lucidrains/BS-RoFormer
	- https://arxiv.org/abs/2309.02612 (Music Source Separation with Band-Split RoPE Transformer)