BS-RoFormer ONNX (Vocals/Instrumental Separation)

Band-Split RoPE Transformer (BS-RoFormer) 的 ONNX 格式模型，用于音乐源分离（人声/伴奏分离）。

本仓库提供从 PyTorch checkpoint 转换的 ONNX 模型及其 uint8 动态量化版本。

Model Details

Architecture: BS-RoFormer (Band-Split RoPE Transformer)
Original Paper: Music Source Separation with Band-Split RoPE Transformer
Parameters: 159.8M (no-STFT variant)
Original Checkpoint: model_bs_roformer_ep_317_sdr_12.9755.ckpt (by viperx)
SDR Performance (MUSDB18HQ):
- Vocals: 12.9 dB
- Instrumental: 17.0 dB

File	Format	Size	Description
`bs_roformer_ep317_sdr12.9755.onnx` + `.data`	ONNX fp32	~615 MB	Full precision, external data
`bs_roformer_ep317_sdr12.9755_quantized_uint8.onnx`	ONNX uint8	~158 MB	Dynamic quantized, single file

Usage

模型为 no-STFT 变体，需要先用 BS_roformer_processor 做 STFT 预处理，再将结果送入 ONNX 模型。

Quick Start

import onnxruntime as ort
import torch
import numpy as np
import soundfile as sf

# 1. Load ONNX model (quantized version, single file)
session = ort.InferenceSession(
    "bs_roformer_ep317_sdr12.9755_quantized_uint8.onnx",
    providers=['CPUExecutionProvider']
)

# 2. For the fp32 version with external data, load via onnx first:
# import onnx
# model = onnx.load("bs_roformer_ep317_sdr12.9755.onnx", load_external_data=True)
# session = ort.InferenceSession(model.SerializeToString(), providers=['CPUExecutionProvider'])

print("Model loaded!")
print(f"Input: {session.get_inputs()[0].name}, shape: {session.get_inputs()[0].shape}")
print(f"Output: {session.get_outputs()[0].name}, shape: {session.get_outputs()[0].shape}")

Full Inference Pipeline

完整的推理流程需要配合 ZFTurbo/MSS_ONNX_TensorRT 的 inference.py 使用：

git clone https://github.com/ZFTurbo/MSS_ONNX_TensorRT
cd MSS_ONNX_TensorRT

python inference.py \
    --model_type bs_roformer \
    --config_path path/to/model_bs_roformer_ep_317_sdr_12.9755.yaml \
    --input_folder path/to/input \
    --store_dir path/to/output \
    --use_onnx \
    --onnx_model_path path/to/bs_roformer_ep317_sdr12.9755.onnx

配置文件可从此处获取： model_bs_roformer_ep_317_sdr_12.9755.yaml

Attribution

Model Architecture: lucidrains/BS-RoFormer (MIT License)
Training Framework: ZFTurbo/Music-Source-Separation-Training (MIT License)
ONNX Conversion Tools: ZFTurbo/MSS_ONNX_TensorRT (MIT License)
Weights: Trained by viperx, distributed via TRvlvr/model_repo
Training Data: MUSDB18HQ + 500 additional songs

Citation

@inproceedings{wang2024bsroformer,
  title={Music Source Separation with Band-Split RoPE Transformer},
  author={Wang, Ju-Chiang and Hsu, Wei-Tsung and Lai, Yi-Hsuan and others},
  booktitle={ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing},
  year={2024}
}

License

MIT License — 代码和转换工具均为 MIT 许可。模型权重由社区训练并公开分发，训练数据包含 MUSDB18HQ（研究用途）。建议商用前进一步确认训练数据的许可条款。

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for xycld/BS-RoFormer-ONNX

Music Source Separation with Band-Split RoPE Transformer

Paper • 2309.02612 • Published Sep 5, 2023 • 1