BS-RoFormer β€” Vocal Extraction

Unofficial HuggingFace packaging of BS-RoFormer (Band-Split RoFormer), using the ViperX checkpoint. Adds a from_pretrained interface and native Apple Silicon (MPS) support.

Extracts vocals from a music track. Outputs exactly 2 stems:

Stem Description
vocals Isolated vocal track
other Everything else (mix βˆ’ vocals)

Quick start

pip install transformers torch librosa soundfile \
            einops beartype rotary-embedding-torch
from transformers import AutoModel

model = AutoModel.from_pretrained("puar-playground/bs-roformer", trust_remote_code=True)
stems = model.separate("song.wav", output_dir="output/")
# output/vocals.wav  β€” isolated vocals
# output/other.wav   β€” instrumental

Device selection

# Auto-detect: CUDA β†’ MPS (Apple Silicon) β†’ CPU
model = AutoModel.from_pretrained("puar-playground/bs-roformer", trust_remote_code=True)

# Explicit
model = AutoModel.from_pretrained("puar-playground/bs-roformer", trust_remote_code=True,
                                  device="mps")   # or "cuda", "cpu"

Repository layout

bs-roformer/
β”œβ”€β”€ modeling_bs_roformer.py   # BSRoFormerConfig + BSRoFormerForSourceSeparation
β”œβ”€β”€ config.json
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ bs_roformer.yaml          # model architecture config
β”œβ”€β”€ bs_roformer.ckpt          # model weights (Git LFS)
└── bs_roformer_src/          # vendored BS-RoFormer source

Credits

Downloads last month
68
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support