Spaces:
Running
Running
File size: 8,549 Bytes
e75ee4c 9ccc1e0 0f3e4c3 9ccc1e0 df756df e75ee4c cee0097 e75ee4c 9ccc1e0 e75ee4c 9ccc1e0 5b51485 cee0097 5b51485 cee0097 5b51485 9ccc1e0 4d8957d 9ccc1e0 5b51485 cee0097 5b51485 cee0097 f7e26a4 cee0097 f7e26a4 cee0097 f7e26a4 cee0097 1e69b92 cee0097 5b51485 cee0097 29dbf34 cee0097 5b51485 9ccc1e0 cee0097 9ccc1e0 f7e26a4 5b51485 cee0097 9ccc1e0 5b51485 9ccc1e0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 | ---
title: SYNTHIA
emoji: πΉ
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Browser-based MIDI keyboard with recording and synthesis
---
# SYNTHIA
Play, record, and let AI continue your musical phrases in real-time. πΉ
## οΏ½ Quick Start
```bash
# Install dependencies
uv sync
# Run the app
uv run python app.py
```
Open **http://127.0.0.1:7860**
---
## ποΈ Architecture Overview
**SYNTHIA** is a browser-based MIDI keyboard with three main layers:
1. **Backend** (Python/Gradio): Configuration, MIDI engines, model loading
2. **Frontend** (JavaScript/Tone.js): Audio synthesis, keyboard rendering, event handling
3. **Communication**: Gradio bridge for sending recorded MIDI to backend for processing
### Data Flow
```
User plays keyboard
β
JavaScript captures MIDI events β records to array
β
User clicks "Play/Process"
β
Backend engine processes recorded events
β
Result returned as MIDI events
β
JavaScript plays result through Tone.js synth
```
---
## π File Responsibilities
### Backend Files
| File | Purpose |
|------|---------|
| **app.py** | Gradio app setup, UI layout, instrument definitions, API endpoints |
| **config.py** | Global settings (audio parameters, model paths, inference defaults) |
| **engines.py** | Three MIDI processing engines: `parrot` (repeat), `reverse_parrot` (reverse), `godzilla_continue` (AI generation) |
| **midi_model.py** | Godzilla model loading, tokenization, inference |
| **midi.py** | MIDI file utilities (encode/decode, cleanup, utilities) |
### Frontend Files
| File | Purpose |
|------|---------|
| **keyboard.html** | DOM structure (keyboard grid, controls, terminal) |
| **keyboard.js** | Main application logic: keyboard rendering, audio synthesis (Tone.js), recording, UI event binding, engine communication |
| **styles.css** | Styling and animations |
### Configuration & Dependencies
| File | Purpose |
|------|---------|
| **requirements.txt** | Python dependencies |
| **pyproject.toml** | Project metadata |
---
## πΉ Core Functionality
### Keyboard Controls
- **Click keys** or **press computer keys** to play notes
- **Record button**: Capture MIDI events from keyboard
- **Play button**: Play back recorded events
- **Save button**: Download recording as .mid file
- **Game mode**: Take turns with AI completing phrases
### MIDI Engines
1. **Parrot**: Repeats your exact melody
2. **Reverse Parrot**: Plays melody backward
3. **Godzilla**: AI generates musical continuations using transformer model
### UI Features
- **Engine selector**: Choose processing method
- **Style selector**: AI style (melodic, energetic, ornate, etc.)
- **Response mode**: Control AI generation behavior
- **Runtime selector**: GPU (fast) vs CPU (reliable)
- **Instrument selector**: Change synth sound
- **AI voice selector**: Change AI synth sound
- **Terminal**: Real-time event logging
---
## π§ How to Add New Functionality
### Adding a New MIDI Engine
1. **In `engines.py`**, add a new function:
```python
def my_new_engine(events, options):
# Process MIDI events
return processed_events
```
2. **In `app.py`**, register the engine in `process_events()`:
```python
elif engine == 'my_engine':
result_events = my_new_engine(events, options)
```
3. **In `app.py`**, add to engine dropdown:
```python
with gr.Group(label="Engine"):
engine = gr.Dropdown(
choices=['parrot', 'reverse_parrot', 'godzilla_continue', 'my_engine'],
# ...
)
```
4. **In `keyboard.js`**, add tooltip (line ~215 in `populateEngineSelect()`):
```javascript
const engineTooltips = {
'my_engine': 'Description of what your engine does'
};
```
### Adding a New Control Selector
1. **In `app.py`**, create the selector in the UI:
```python
my_control = gr.Dropdown(
choices=['option1', 'option2'],
label="My Control",
value='option1'
)
```
2. **In `keyboard.js`** (line ~1510), add to `selectControls` array:
```javascript
{
element: myControlSelect,
getter: () => ({ label: myControlSelect.value }),
message: (result) => `Control switched to: ${result.label}`
}
```
3. **In `keyboard.js`**, pass control to engine via `processEventsThroughEngine()`:
```javascript
const engineOptions = {
my_control: document.getElementById('myControl').value,
// ... other options
};
```
### Adding a New Response Mode
1. **In `keyboard.js`** (line ~175), add preset definition:
```javascript
const RESPONSE_MODES = {
'my_mode': {
label: 'My Mode',
processFunction: (events) => {
// Processing logic
return processedEvents;
}
}
};
```
2. **In `app.py`**, add to response mode dropdown
3. **Use in engine logic** via `getSelectedResponseMode()`
---
## π Recent Refactoring (Feb 2026)
Code consolidation to improve maintainability:
- **Consolidated getter functions**: Single `getSelectedPreset()` replaces 3 similar functions
- **Unified event listeners**: Loop-based pattern for select controls (runtime, style, mode, length)
- **Extracted helper functions**: `resetAllNotesAndVisuals()` replaces 3 duplicated blocks
- **Result**: Reduced redundancy, easier to modify preset logic, consistent patterns
---
## β‘ Benchmarking
`benchmark.py` measures Godzilla model generation speed across all combinations of input length and generation length, with CPU and GPU compared side by side.
### What it tests
| Axis | Values |
|------|--------|
| Input length | Short (8 notes, ~4 s) Β· Long (90 notes, ~18 s) |
| Generation length | 32 Β· 64 Β· 96 Β· 128 tokens (matches the four UI presets) |
| Devices | CPU always Β· CUDA if available |
Each combination runs a warm-up pass (model load, timing discarded) followed by `--runs` timed passes. The summary tables report mean, std, min, max in both ms and seconds, plus tokens/sec and GPU speedup.
### Usage
```bash
# Full sweep β CPU + GPU (if available), 5 runs per combination
uv run python benchmark.py
# CPU only (useful for verifying the script or on CPU-only machines)
uv run python benchmark.py --cpu-only
# Increase runs for tighter statistics
uv run python benchmark.py --runs 10
# Multi-candidate generation (higher quality, slower)
uv run python benchmark.py --candidates 3
```
Results are printed to stdout and saved to `benchmark_results.txt` (override with `--output`).
### Example output
```
============================================================
Device: CUDA | candidates=1
============================================================
[warm-up] loading model + first inference...
input=short (8 notes, ~4s) gen= 32 tokens [1:85ms] [2:82ms] ...
...
================================================================================
SUMMARY β CUDA | candidates=1
================================================================================
Input Gen tok Mean ms Mean s Std ms Min ms Max ms tok/s
-----------------------------------------------------------------------------------------
short (8 notes, ~4s) 32 85 0.09 2.1 82 89 376.5
short (8 notes, ~4s) 128 290 0.29 4.3 284 297 441.4
long (90 notes, ~18s) 32 91 0.09 1.8 88 94 351.6
long (90 notes, ~18s) 128 305 0.31 3.9 299 312 419.7
```
---
## π οΈ Development Tips
### Debugging
- **Terminal in UI**: Shows all MIDI events and engine responses
- **Browser console**: `F12` for JavaScript errors
- **Python terminal**: Check server-side logs for model loading, inference errors
### Testing New Engines
1. Record a simple 3-5 note progression
2. Play back with different engines
3. Check terminal for processing details
4. Verify output notes are in valid range (0-127)
### Performance
- **Recording**: Event capture happens in JavaScript (fast, local)
- **Processing**: May take 2-5 seconds depending on engine and model
- **Playback**: Tone.js synthesis is real-time (instant)
---
## π§ Technology Stack
- **Frontend**: Tone.js v6+ (Web Audio API)
- **Backend**: Gradio 5.49.1 + Python 3.10+
- **MIDI**: mido library
- **Model**: Godzilla Piano Transformer (via Hugging Face)
---
## π License
Open source - free to use and modify.
|