Spaces:

ymcnabb
/

StemSplitter

No application file

App Files Files Community

StemSplitter / README.md

ymcnabb

Upload folder using huggingface_hub

1824ea0 verified 4 days ago

preview code

raw

history blame contribute delete

4.97 kB

	---
	title: StemSplitter
	app_file: /Users/YaronMcNabb_1/Documents/StemSplitter/src/stemsplitter/web.py
	sdk: gradio
	sdk_version: 6.6.0
	---
	# StemSplitter

	Audio stem separation tool that splits songs into individual components (vocals, drums, bass, instruments). Provides both a command-line interface and a Gradio web UI.

	Powered by open-source models via [audio-separator](https://github.com/nomadkaraoke/python-audio-separator):

	\| Mode \| Stems \| Default Model \|
	\|------\|-------\|---------------\|
	\| 2-stem \| Vocals, Instrumental \| MelBand-RoFormer \|
	\| 4-stem \| Vocals, Drums, Bass, Other \| Demucs htdemucs_ft \|

	## Prerequisites

	- Python 3.10+
	- [uv](https://docs.astral.sh/uv/getting-started/installation/) for dependency management
	- FFmpeg (required by audio-separator for reading various audio formats)

	Install FFmpeg if you don't have it:

	```bash
	# macOS
	brew install ffmpeg

	# Ubuntu/Debian
	sudo apt install ffmpeg

	# Windows (via chocolatey)
	choco install ffmpeg
	```

	## Installation

	```bash
	git clone <repo-url>
	cd StemSplitter

	# Copy the example env file and adjust as needed
	cp .env.example .env

	# Install dependencies (CPU inference)
	uv sync --extra dev

	# Or, for GPU-accelerated inference (NVIDIA CUDA)
	uv sync --extra dev --extra gpu
	```

	Models are downloaded automatically on first use (~200 MB for 2-stem, ~800 MB for 4-stem).

	## Usage

	### CLI

	```bash
	# Basic 2-stem separation (vocals + instrumental), outputs WAV
	uv run stemsplitter song.mp3

	# 4-stem separation with FLAC output
	uv run stemsplitter song.wav -m 4stem -f FLAC

	# MP3 output to a custom directory
	uv run stemsplitter song.flac -m 2stem -f MP3 -o ./my_stems/

	# Override the model
	uv run stemsplitter song.mp3 --model htdemucs.yaml

	# Show all options
	uv run stemsplitter --help
	```

	Supported input formats: MP3, WAV, FLAC, OGG, M4A, and anything FFmpeg can decode.

	Supported output formats: WAV, MP3, FLAC (set via `-f` flag or `STEMSPLITTER_OUTPUT_FORMAT` in `.env`).

	### Web UI

	```bash
	uv run stemsplitter-web
	```

	Opens a Gradio interface (default: `http://127.0.0.1:7860`) where you can:

	1. Upload an audio file
	2. Choose separation mode (2-stem or 4-stem)
	3. Choose output format (WAV, MP3, FLAC)
	4. Click Separate and download individual stems

	A public share link is also generated automatically.

	## Project Structure

	```
	src/stemsplitter/
	__init__.py # Package version
	config.py # Settings loaded from .env with sensible defaults
	separator.py # Core StemSplitter class wrapping audio-separator
	cli.py # Click-based CLI entry point
	web.py # Gradio web UI

	tests/
	conftest.py # Shared fixtures (mock separator, synthetic audio)
	test_config.py # Configuration loading tests
	test_separator.py# Core separation logic tests
	test_cli.py # CLI invocation tests
	test_web.py # Web UI handler tests
	```

	### Components

	- config.py -- Loads settings from a `.env` file using `python-dotenv`. All values are exposed as a frozen `Settings` dataclass. See `.env.example` for the full list of options.

	- separator.py -- Wraps `audio-separator` with a `StemSplitter` class that handles model selection per mode, lazy initialization (so imports are fast), and model caching (the model stays loaded between calls).

	- cli.py -- A Click command that accepts an input file and flags for mode, format, output directory, and model override.

	- web.py -- A Gradio Blocks app with audio upload, mode/format radio buttons, and per-stem audio outputs. The 4-stem outputs (drums, bass) are hidden in 2-stem mode.

	## Configuration

	All settings are configurable via environment variables in `.env`:

	\| Variable \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `STEMSPLITTER_OUTPUT_DIR` \| `./output` \| Directory for separated stems \|
	\| `STEMSPLITTER_MODEL_DIR` \| `/tmp/audio-separator-models/` \| Where downloaded models are cached \|
	\| `STEMSPLITTER_2STEM_MODEL` \| `model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt` \| Model for 2-stem separation \|
	\| `STEMSPLITTER_4STEM_MODEL` \| `htdemucs_ft.yaml` \| Model for 4-stem separation \|
	\| `STEMSPLITTER_OUTPUT_FORMAT` \| `WAV` \| Default output format (WAV, MP3, FLAC) \|
	\| `STEMSPLITTER_OUTPUT_BITRATE` \| `320k` \| Bitrate for MP3 output \|
	\| `STEMSPLITTER_SAMPLE_RATE` \| `44100` \| Output sample rate \|
	\| `STEMSPLITTER_NORMALIZATION` \| `0.9` \| Peak normalization threshold \|
	\| `STEMSPLITTER_LOG_LEVEL` \| `WARNING` \| Logging verbosity (DEBUG, INFO, WARNING, ERROR) \|
	\| `STEMSPLITTER_WEB_HOST` \| `127.0.0.1` \| Web UI bind address \|
	\| `STEMSPLITTER_WEB_PORT` \| `7860` \| Web UI port \|

	## Running Tests

	```bash
	# Run all tests
	uv run pytest

	# Verbose output
	uv run pytest -v

	# With coverage report
	uv run pytest -v --cov=stemsplitter --cov-report=term-missing

	# Run a specific test file
	uv run pytest tests/test_separator.py
	```

	Tests use mocked models so no GPU or model downloads are required.

	## License

	MIT