Spaces:

ymcnabb
/

StemSplitter

No application file

App Files Files Community

StemSplitter / README.md

ymcnabb

Upload folder using huggingface_hub

1824ea0 verified 4 days ago

preview code

raw

history blame contribute delete

4.97 kB

A newer version of the Gradio SDK is available: 6.8.0

Upgrade

metadata

title: StemSplitter
app_file: /Users/YaronMcNabb_1/Documents/StemSplitter/src/stemsplitter/web.py
sdk: gradio
sdk_version: 6.6.0

StemSplitter

Audio stem separation tool that splits songs into individual components (vocals, drums, bass, instruments). Provides both a command-line interface and a Gradio web UI.

Mode	Stems	Default Model
2-stem	Vocals, Instrumental	MelBand-RoFormer
4-stem	Vocals, Drums, Bass, Other	Demucs htdemucs_ft

Prerequisites

Python 3.10+
uv for dependency management
FFmpeg (required by audio-separator for reading various audio formats)

Install FFmpeg if you don't have it:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows (via chocolatey)
choco install ffmpeg

Installation

git clone <repo-url>
cd StemSplitter

# Copy the example env file and adjust as needed
cp .env.example .env

# Install dependencies (CPU inference)
uv sync --extra dev

# Or, for GPU-accelerated inference (NVIDIA CUDA)
uv sync --extra dev --extra gpu

Models are downloaded automatically on first use (~200 MB for 2-stem, ~800 MB for 4-stem).

Usage

CLI

# Basic 2-stem separation (vocals + instrumental), outputs WAV
uv run stemsplitter song.mp3

# 4-stem separation with FLAC output
uv run stemsplitter song.wav -m 4stem -f FLAC

# MP3 output to a custom directory
uv run stemsplitter song.flac -m 2stem -f MP3 -o ./my_stems/

# Override the model
uv run stemsplitter song.mp3 --model htdemucs.yaml

# Show all options
uv run stemsplitter --help

Supported input formats: MP3, WAV, FLAC, OGG, M4A, and anything FFmpeg can decode.

Supported output formats: WAV, MP3, FLAC (set via -f flag or STEMSPLITTER_OUTPUT_FORMAT in .env).

Web UI

uv run stemsplitter-web

Opens a Gradio interface (default: http://127.0.0.1:7860) where you can:

Upload an audio file
Choose separation mode (2-stem or 4-stem)
Choose output format (WAV, MP3, FLAC)
Click Separate and download individual stems

A public share link is also generated automatically.

Project Structure

src/stemsplitter/
    __init__.py      # Package version
    config.py        # Settings loaded from .env with sensible defaults
    separator.py     # Core StemSplitter class wrapping audio-separator
    cli.py           # Click-based CLI entry point
    web.py           # Gradio web UI

tests/
    conftest.py      # Shared fixtures (mock separator, synthetic audio)
    test_config.py   # Configuration loading tests
    test_separator.py# Core separation logic tests
    test_cli.py      # CLI invocation tests
    test_web.py      # Web UI handler tests

Components

config.py -- Loads settings from a .env file using python-dotenv. All values are exposed as a frozen Settings dataclass. See .env.example for the full list of options.
separator.py -- Wraps audio-separator with a StemSplitter class that handles model selection per mode, lazy initialization (so imports are fast), and model caching (the model stays loaded between calls).
cli.py -- A Click command that accepts an input file and flags for mode, format, output directory, and model override.
web.py -- A Gradio Blocks app with audio upload, mode/format radio buttons, and per-stem audio outputs. The 4-stem outputs (drums, bass) are hidden in 2-stem mode.

Configuration

All settings are configurable via environment variables in .env:

Variable	Default	Description
`STEMSPLITTER_OUTPUT_DIR`	`./output`	Directory for separated stems
`STEMSPLITTER_MODEL_DIR`	`/tmp/audio-separator-models/`	Where downloaded models are cached
`STEMSPLITTER_2STEM_MODEL`	`model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt`	Model for 2-stem separation
`STEMSPLITTER_4STEM_MODEL`	`htdemucs_ft.yaml`	Model for 4-stem separation
`STEMSPLITTER_OUTPUT_FORMAT`	`WAV`	Default output format (WAV, MP3, FLAC)
`STEMSPLITTER_OUTPUT_BITRATE`	`320k`	Bitrate for MP3 output
`STEMSPLITTER_SAMPLE_RATE`	`44100`	Output sample rate
`STEMSPLITTER_NORMALIZATION`	`0.9`	Peak normalization threshold
`STEMSPLITTER_LOG_LEVEL`	`WARNING`	Logging verbosity (DEBUG, INFO, WARNING, ERROR)
`STEMSPLITTER_WEB_HOST`	`127.0.0.1`	Web UI bind address
`STEMSPLITTER_WEB_PORT`	`7860`	Web UI port

Running Tests

# Run all tests
uv run pytest

# Verbose output
uv run pytest -v

# With coverage report
uv run pytest -v --cov=stemsplitter --cov-report=term-missing

# Run a specific test file
uv run pytest tests/test_separator.py

Tests use mocked models so no GPU or model downloads are required.

License

MIT