File size: 4,971 Bytes
cad5a52
 
1824ea0
cad5a52
 
 
1824ea0
cad5a52
1824ea0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
title: StemSplitter
app_file: /Users/YaronMcNabb_1/Documents/StemSplitter/src/stemsplitter/web.py
sdk: gradio
sdk_version: 6.6.0
---
# StemSplitter

Audio stem separation tool that splits songs into individual components (vocals, drums, bass, instruments). Provides both a command-line interface and a Gradio web UI.

Powered by open-source models via [audio-separator](https://github.com/nomadkaraoke/python-audio-separator):

| Mode | Stems | Default Model |
|------|-------|---------------|
| 2-stem | Vocals, Instrumental | MelBand-RoFormer |
| 4-stem | Vocals, Drums, Bass, Other | Demucs htdemucs_ft |

## Prerequisites

- Python 3.10+
- [uv](https://docs.astral.sh/uv/getting-started/installation/) for dependency management
- FFmpeg (required by audio-separator for reading various audio formats)

Install FFmpeg if you don't have it:

```bash
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows (via chocolatey)
choco install ffmpeg
```

## Installation

```bash
git clone <repo-url>
cd StemSplitter

# Copy the example env file and adjust as needed
cp .env.example .env

# Install dependencies (CPU inference)
uv sync --extra dev

# Or, for GPU-accelerated inference (NVIDIA CUDA)
uv sync --extra dev --extra gpu
```

Models are downloaded automatically on first use (~200 MB for 2-stem, ~800 MB for 4-stem).

## Usage

### CLI

```bash
# Basic 2-stem separation (vocals + instrumental), outputs WAV
uv run stemsplitter song.mp3

# 4-stem separation with FLAC output
uv run stemsplitter song.wav -m 4stem -f FLAC

# MP3 output to a custom directory
uv run stemsplitter song.flac -m 2stem -f MP3 -o ./my_stems/

# Override the model
uv run stemsplitter song.mp3 --model htdemucs.yaml

# Show all options
uv run stemsplitter --help
```

**Supported input formats:** MP3, WAV, FLAC, OGG, M4A, and anything FFmpeg can decode.

**Supported output formats:** WAV, MP3, FLAC (set via `-f` flag or `STEMSPLITTER_OUTPUT_FORMAT` in `.env`).

### Web UI

```bash
uv run stemsplitter-web
```

Opens a Gradio interface (default: `http://127.0.0.1:7860`) where you can:

1. Upload an audio file
2. Choose separation mode (2-stem or 4-stem)
3. Choose output format (WAV, MP3, FLAC)
4. Click **Separate** and download individual stems

A public share link is also generated automatically.

## Project Structure

```
src/stemsplitter/
    __init__.py      # Package version
    config.py        # Settings loaded from .env with sensible defaults
    separator.py     # Core StemSplitter class wrapping audio-separator
    cli.py           # Click-based CLI entry point
    web.py           # Gradio web UI

tests/
    conftest.py      # Shared fixtures (mock separator, synthetic audio)
    test_config.py   # Configuration loading tests
    test_separator.py# Core separation logic tests
    test_cli.py      # CLI invocation tests
    test_web.py      # Web UI handler tests
```

### Components

- **config.py** -- Loads settings from a `.env` file using `python-dotenv`. All values are exposed as a frozen `Settings` dataclass. See `.env.example` for the full list of options.

- **separator.py** -- Wraps `audio-separator` with a `StemSplitter` class that handles model selection per mode, lazy initialization (so imports are fast), and model caching (the model stays loaded between calls).

- **cli.py** -- A Click command that accepts an input file and flags for mode, format, output directory, and model override.

- **web.py** -- A Gradio Blocks app with audio upload, mode/format radio buttons, and per-stem audio outputs. The 4-stem outputs (drums, bass) are hidden in 2-stem mode.

## Configuration

All settings are configurable via environment variables in `.env`:

| Variable | Default | Description |
|----------|---------|-------------|
| `STEMSPLITTER_OUTPUT_DIR` | `./output` | Directory for separated stems |
| `STEMSPLITTER_MODEL_DIR` | `/tmp/audio-separator-models/` | Where downloaded models are cached |
| `STEMSPLITTER_2STEM_MODEL` | `model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt` | Model for 2-stem separation |
| `STEMSPLITTER_4STEM_MODEL` | `htdemucs_ft.yaml` | Model for 4-stem separation |
| `STEMSPLITTER_OUTPUT_FORMAT` | `WAV` | Default output format (WAV, MP3, FLAC) |
| `STEMSPLITTER_OUTPUT_BITRATE` | `320k` | Bitrate for MP3 output |
| `STEMSPLITTER_SAMPLE_RATE` | `44100` | Output sample rate |
| `STEMSPLITTER_NORMALIZATION` | `0.9` | Peak normalization threshold |
| `STEMSPLITTER_LOG_LEVEL` | `WARNING` | Logging verbosity (DEBUG, INFO, WARNING, ERROR) |
| `STEMSPLITTER_WEB_HOST` | `127.0.0.1` | Web UI bind address |
| `STEMSPLITTER_WEB_PORT` | `7860` | Web UI port |

## Running Tests

```bash
# Run all tests
uv run pytest

# Verbose output
uv run pytest -v

# With coverage report
uv run pytest -v --cov=stemsplitter --cov-report=term-missing

# Run a specific test file
uv run pytest tests/test_separator.py
```

Tests use mocked models so no GPU or model downloads are required.

## License

MIT