BS-RoFormer ONNX (Vocals/Instrumental Separation)

Band-Split RoPE Transformer (BS-RoFormer) 的 ONNX 格式模型,用于音乐源分离(人声/伴奏分离)。

本仓库提供从 PyTorch checkpoint 转换的 ONNX 模型及其 uint8 动态量化版本。

Model Details

  • Architecture: BS-RoFormer (Band-Split RoPE Transformer)
  • Original Paper: Music Source Separation with Band-Split RoPE Transformer
  • Parameters: 159.8M (no-STFT variant)
  • Original Checkpoint: model_bs_roformer_ep_317_sdr_12.9755.ckpt (by viperx)
  • SDR Performance (MUSDB18HQ):
    • Vocals: 12.9 dB
    • Instrumental: 17.0 dB
File Format Size Description
bs_roformer_ep317_sdr12.9755.onnx + .data ONNX fp32 ~615 MB Full precision, external data
bs_roformer_ep317_sdr12.9755_quantized_uint8.onnx ONNX uint8 ~158 MB Dynamic quantized, single file

Usage

模型为 no-STFT 变体,需要先用 BS_roformer_processor 做 STFT 预处理,再将结果送入 ONNX 模型。

Quick Start

import onnxruntime as ort
import torch
import numpy as np
import soundfile as sf

# 1. Load ONNX model (quantized version, single file)
session = ort.InferenceSession(
    "bs_roformer_ep317_sdr12.9755_quantized_uint8.onnx",
    providers=['CPUExecutionProvider']
)

# 2. For the fp32 version with external data, load via onnx first:
# import onnx
# model = onnx.load("bs_roformer_ep317_sdr12.9755.onnx", load_external_data=True)
# session = ort.InferenceSession(model.SerializeToString(), providers=['CPUExecutionProvider'])

print("Model loaded!")
print(f"Input: {session.get_inputs()[0].name}, shape: {session.get_inputs()[0].shape}")
print(f"Output: {session.get_outputs()[0].name}, shape: {session.get_outputs()[0].shape}")

Full Inference Pipeline

完整的推理流程需要配合 ZFTurbo/MSS_ONNX_TensorRTinference.py 使用:

git clone https://github.com/ZFTurbo/MSS_ONNX_TensorRT
cd MSS_ONNX_TensorRT

python inference.py \
    --model_type bs_roformer \
    --config_path path/to/model_bs_roformer_ep_317_sdr_12.9755.yaml \
    --input_folder path/to/input \
    --store_dir path/to/output \
    --use_onnx \
    --onnx_model_path path/to/bs_roformer_ep317_sdr12.9755.onnx

配置文件可从此处获取: model_bs_roformer_ep_317_sdr_12.9755.yaml

Attribution

Citation

@inproceedings{wang2024bsroformer,
  title={Music Source Separation with Band-Split RoPE Transformer},
  author={Wang, Ju-Chiang and Hsu, Wei-Tsung and Lai, Yi-Hsuan and others},
  booktitle={ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing},
  year={2024}
}

License

MIT License — 代码和转换工具均为 MIT 许可。模型权重由社区训练并公开分发,训练数据包含 MUSDB18HQ(研究用途)。建议商用前进一步确认训练数据的许可条款。

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for xycld/BS-RoFormer-ONNX