--- library_name: transformers license: mit datasets: - CLAPv2/MUSDB18-HQ pipeline_tag: audio-to-audio tags: - music new_version: HiDolen/Mini-BS-RoFormer-18M --- # Model Card for Model ID Model for the Music source separation task. Its implementation is referenced to [the existing BS-RoFormer code](https://github.com/lucidrains/BS-RoFormer). 针对音乐音频分离任务的模型。改编自 [现有的 BS-RoFormer 模型代码](https://github.com/lucidrains/BS-RoFormer)。 ## Model Details 模型参数: - depth = 3 - hidden_size = 256 - intermediate_size = 256 * 2 总参数量只有 8.8M,在 MUSDB18HQ 数据的 val 集上达到平均 SDR 6.5 的性能。分轨具体 SDR: - bass,5.66 - drums,6.77 - other,6.06 - vocal,7.44 ## Uses 使用的 transformers 库版本为 4.55.4。为了正常运行模型还需要安装库 soudfile 和 einops。 CPU 推理: ```python from transformers import AutoModel import soundfile import torch model_name = "HiDolen/Mini-BS-RoFormer" model = AutoModel.from_pretrained( model_name, trust_remote_code=True, ) # 加载音频 file = "./Bruno Mars - Runaway Baby.mp3" waveform, sr = soundfile.read(file) assert sr == 44100 # 采样率必须为 44100Hz waveform = torch.tensor(waveform).T.float() # 进行推理 result = model.separate( waveform, chunk_size=44100 * 6, overlap_size=44100 * 3, gap_size=0, batch_size=16, verbose=True, ) # 保存处理结果 for i in range(result.shape[0]): soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100) ``` GPU 推理: ```python from transformers import AutoModel import soundfile import torch model_name = "HiDolen/Mini-BS-RoFormer" model = AutoModel.from_pretrained( model_name, trust_remote_code=True, ) model.to("cuda") # 加载音频 file = "./Bruno Mars - Runaway Baby.mp3" waveform, sr = soundfile.read(file) assert sr == 44100 # 采样率必须为 44100Hz waveform = torch.tensor(waveform).T.float() waveform = waveform.to("cuda") # 进行推理 result = model.separate( waveform, chunk_size=44100 * 6, overlap_size=44100 * 3, gap_size=0, batch_size=16, verbose=True, ) # 保存处理结果 for i in range(result.shape[0]): soundfile.write(f"separated_stem_{i}.wav", result[i].cpu().numpy().T, 44100) ``` ## Training Details 使用 MUSDB18HQ 数据进行训练。 不使用原论文中提到的 Multi-STFT 损失项以提高训练速度。 学习率 5e-4,以 batch_size=6 训练 60k 步。 ## Acknowledgments - https://github.com/lucidrains/BS-RoFormer - https://arxiv.org/abs/2309.02612 (Music Source Separation with Band-Split RoPE Transformer)