Spaces:

ymcnabb
/

StemSplitter

No application file

File size: 4,971 Bytes

cad5a52
 
1824ea0
cad5a52
 
 
1824ea0
cad5a52
1824ea0

---
title: StemSplitter
app_file: /Users/YaronMcNabb_1/Documents/StemSplitter/src/stemsplitter/web.py
sdk: gradio
sdk_version: 6.6.0
---
# StemSplitter

Audio stem separation tool that splits songs into individual components (vocals, drums, bass, instruments). Provides both a command-line interface and a Gradio web UI.

Powered by open-source models via [audio-separator](https://github.com/nomadkaraoke/python-audio-separator):

| Mode | Stems | Default Model |
|------|-------|---------------|
| 2-stem | Vocals, Instrumental | MelBand-RoFormer |
| 4-stem | Vocals, Drums, Bass, Other | Demucs htdemucs_ft |

## Prerequisites

- Python 3.10+
- [uv](https://docs.astral.sh/uv/getting-started/installation/) for dependency management
- FFmpeg (required by audio-separator for reading various audio formats)

Install FFmpeg if you don't have it:

```bash
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows (via chocolatey)
choco install ffmpeg
```

## Installation

```bash
git clone <repo-url>
cd StemSplitter

# Copy the example env file and adjust as needed
cp .env.example .env

# Install dependencies (CPU inference)
uv sync --extra dev

# Or, for GPU-accelerated inference (NVIDIA CUDA)
uv sync --extra dev --extra gpu
```

Models are downloaded automatically on first use (~200 MB for 2-stem, ~800 MB for 4-stem).

## Usage

### CLI

```bash
# Basic 2-stem separation (vocals + instrumental), outputs WAV
uv run stemsplitter song.mp3

# 4-stem separation with FLAC output
uv run stemsplitter song.wav -m 4stem -f FLAC

# MP3 output to a custom directory
uv run stemsplitter song.flac -m 2stem -f MP3 -o ./my_stems/

# Override the model
uv run stemsplitter song.mp3 --model htdemucs.yaml

# Show all options
uv run stemsplitter --help
```

**Supported input formats:** MP3, WAV, FLAC, OGG, M4A, and anything FFmpeg can decode.

**Supported output formats:** WAV, MP3, FLAC (set via `-f` flag or `STEMSPLITTER_OUTPUT_FORMAT` in `.env`).

### Web UI

```bash
uv run stemsplitter-web
```

Opens a Gradio interface (default: `http://127.0.0.1:7860`) where you can:

1. Upload an audio file
2. Choose separation mode (2-stem or 4-stem)
3. Choose output format (WAV, MP3, FLAC)
4. Click **Separate** and download individual stems

A public share link is also generated automatically.

## Project Structure

```
src/stemsplitter/
    __init__.py      # Package version
    config.py        # Settings loaded from .env with sensible defaults
    separator.py     # Core StemSplitter class wrapping audio-separator
    cli.py           # Click-based CLI entry point
    web.py           # Gradio web UI

tests/
    conftest.py      # Shared fixtures (mock separator, synthetic audio)
    test_config.py   # Configuration loading tests
    test_separator.py# Core separation logic tests
    test_cli.py      # CLI invocation tests
    test_web.py      # Web UI handler tests
```

### Components

- **config.py** -- Loads settings from a `.env` file using `python-dotenv`. All values are exposed as a frozen `Settings` dataclass. See `.env.example` for the full list of options.

- **separator.py** -- Wraps `audio-separator` with a `StemSplitter` class that handles model selection per mode, lazy initialization (so imports are fast), and model caching (the model stays loaded between calls).

- **cli.py** -- A Click command that accepts an input file and flags for mode, format, output directory, and model override.

- **web.py** -- A Gradio Blocks app with audio upload, mode/format radio buttons, and per-stem audio outputs. The 4-stem outputs (drums, bass) are hidden in 2-stem mode.

## Configuration

All settings are configurable via environment variables in `.env`:

| Variable | Default | Description |
|----------|---------|-------------|
| `STEMSPLITTER_OUTPUT_DIR` | `./output` | Directory for separated stems |
| `STEMSPLITTER_MODEL_DIR` | `/tmp/audio-separator-models/` | Where downloaded models are cached |
| `STEMSPLITTER_2STEM_MODEL` | `model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt` | Model for 2-stem separation |
| `STEMSPLITTER_4STEM_MODEL` | `htdemucs_ft.yaml` | Model for 4-stem separation |
| `STEMSPLITTER_OUTPUT_FORMAT` | `WAV` | Default output format (WAV, MP3, FLAC) |
| `STEMSPLITTER_OUTPUT_BITRATE` | `320k` | Bitrate for MP3 output |
| `STEMSPLITTER_SAMPLE_RATE` | `44100` | Output sample rate |
| `STEMSPLITTER_NORMALIZATION` | `0.9` | Peak normalization threshold |
| `STEMSPLITTER_LOG_LEVEL` | `WARNING` | Logging verbosity (DEBUG, INFO, WARNING, ERROR) |
| `STEMSPLITTER_WEB_HOST` | `127.0.0.1` | Web UI bind address |
| `STEMSPLITTER_WEB_PORT` | `7860` | Web UI port |

## Running Tests

```bash
# Run all tests
uv run pytest

# Verbose output
uv run pytest -v

# With coverage report
uv run pytest -v --cov=stemsplitter --cov-report=term-missing

# Run a specific test file
uv run pytest tests/test_separator.py
```

Tests use mocked models so no GPU or model downloads are required.

## License

MIT