Music Source Separation with Band-Split RoPE Transformer
Paper
• 2309.02612 • Published
• 1
Band-Split RoPE Transformer (BS-RoFormer) 的 ONNX 格式模型,用于音乐源分离(人声/伴奏分离)。
本仓库提供从 PyTorch checkpoint 转换的 ONNX 模型及其 uint8 动态量化版本。
model_bs_roformer_ep_317_sdr_12.9755.ckpt (by viperx)| File | Format | Size | Description |
|---|---|---|---|
bs_roformer_ep317_sdr12.9755.onnx + .data |
ONNX fp32 | ~615 MB | Full precision, external data |
bs_roformer_ep317_sdr12.9755_quantized_uint8.onnx |
ONNX uint8 | ~158 MB | Dynamic quantized, single file |
模型为 no-STFT 变体,需要先用 BS_roformer_processor 做 STFT 预处理,再将结果送入 ONNX 模型。
import onnxruntime as ort
import torch
import numpy as np
import soundfile as sf
# 1. Load ONNX model (quantized version, single file)
session = ort.InferenceSession(
"bs_roformer_ep317_sdr12.9755_quantized_uint8.onnx",
providers=['CPUExecutionProvider']
)
# 2. For the fp32 version with external data, load via onnx first:
# import onnx
# model = onnx.load("bs_roformer_ep317_sdr12.9755.onnx", load_external_data=True)
# session = ort.InferenceSession(model.SerializeToString(), providers=['CPUExecutionProvider'])
print("Model loaded!")
print(f"Input: {session.get_inputs()[0].name}, shape: {session.get_inputs()[0].shape}")
print(f"Output: {session.get_outputs()[0].name}, shape: {session.get_outputs()[0].shape}")
完整的推理流程需要配合 ZFTurbo/MSS_ONNX_TensorRT 的 inference.py 使用:
git clone https://github.com/ZFTurbo/MSS_ONNX_TensorRT
cd MSS_ONNX_TensorRT
python inference.py \
--model_type bs_roformer \
--config_path path/to/model_bs_roformer_ep_317_sdr_12.9755.yaml \
--input_folder path/to/input \
--store_dir path/to/output \
--use_onnx \
--onnx_model_path path/to/bs_roformer_ep317_sdr12.9755.onnx
配置文件可从此处获取: model_bs_roformer_ep_317_sdr_12.9755.yaml
@inproceedings{wang2024bsroformer,
title={Music Source Separation with Band-Split RoPE Transformer},
author={Wang, Ju-Chiang and Hsu, Wei-Tsung and Lai, Yi-Hsuan and others},
booktitle={ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing},
year={2024}
}
MIT License — 代码和转换工具均为 MIT 许可。模型权重由社区训练并公开分发,训练数据包含 MUSDB18HQ(研究用途)。建议商用前进一步确认训练数据的许可条款。