Spaces:
Running
A newer version of the Gradio SDK is available:
6.8.0
title: SYNTHIA
emoji: πΉ
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Browser-based MIDI keyboard with recording and synthesis
SYNTHIA
Play, record, and let AI continue your musical phrases in real-time. πΉ
οΏ½ Quick Start
# Install dependencies
uv sync
# Run the app
uv run python app.py
ποΈ Architecture Overview
SYNTHIA is a browser-based MIDI keyboard with three main layers:
- Backend (Python/Gradio): Configuration, MIDI engines, model loading
- Frontend (JavaScript/Tone.js): Audio synthesis, keyboard rendering, event handling
- Communication: Gradio bridge for sending recorded MIDI to backend for processing
Data Flow
User plays keyboard
β
JavaScript captures MIDI events β records to array
β
User clicks "Play/Process"
β
Backend engine processes recorded events
β
Result returned as MIDI events
β
JavaScript plays result through Tone.js synth
π File Responsibilities
Backend Files
| File | Purpose |
|---|---|
| app.py | Gradio app setup, UI layout, instrument definitions, API endpoints |
| config.py | Global settings (audio parameters, model paths, inference defaults) |
| engines.py | Three MIDI processing engines: parrot (repeat), reverse_parrot (reverse), godzilla_continue (AI generation) |
| midi_model.py | Godzilla model loading, tokenization, inference |
| midi.py | MIDI file utilities (encode/decode, cleanup, utilities) |
Frontend Files
| File | Purpose |
|---|---|
| keyboard.html | DOM structure (keyboard grid, controls, terminal) |
| keyboard.js | Main application logic: keyboard rendering, audio synthesis (Tone.js), recording, UI event binding, engine communication |
| styles.css | Styling and animations |
Configuration & Dependencies
| File | Purpose |
|---|---|
| requirements.txt | Python dependencies |
| pyproject.toml | Project metadata |
πΉ Core Functionality
Keyboard Controls
- Click keys or press computer keys to play notes
- Record button: Capture MIDI events from keyboard
- Play button: Play back recorded events
- Save button: Download recording as .mid file
- Game mode: Take turns with AI completing phrases
MIDI Engines
- Parrot: Repeats your exact melody
- Reverse Parrot: Plays melody backward
- Godzilla: AI generates musical continuations using transformer model
UI Features
- Engine selector: Choose processing method
- Style selector: AI style (melodic, energetic, ornate, etc.)
- Response mode: Control AI generation behavior
- Runtime selector: GPU (fast) vs CPU (reliable)
- Instrument selector: Change synth sound
- AI voice selector: Change AI synth sound
- Terminal: Real-time event logging
π§ How to Add New Functionality
Adding a New MIDI Engine
In
engines.py, add a new function:def my_new_engine(events, options): # Process MIDI events return processed_eventsIn
app.py, register the engine inprocess_events():elif engine == 'my_engine': result_events = my_new_engine(events, options)In
app.py, add to engine dropdown:with gr.Group(label="Engine"): engine = gr.Dropdown( choices=['parrot', 'reverse_parrot', 'godzilla_continue', 'my_engine'], # ... )In
keyboard.js, add tooltip (line ~215 inpopulateEngineSelect()):const engineTooltips = { 'my_engine': 'Description of what your engine does' };
Adding a New Control Selector
In
app.py, create the selector in the UI:my_control = gr.Dropdown( choices=['option1', 'option2'], label="My Control", value='option1' )In
keyboard.js(line ~1510), add toselectControlsarray:{ element: myControlSelect, getter: () => ({ label: myControlSelect.value }), message: (result) => `Control switched to: ${result.label}` }In
keyboard.js, pass control to engine viaprocessEventsThroughEngine():const engineOptions = { my_control: document.getElementById('myControl').value, // ... other options };
Adding a New Response Mode
In
keyboard.js(line ~175), add preset definition:const RESPONSE_MODES = { 'my_mode': { label: 'My Mode', processFunction: (events) => { // Processing logic return processedEvents; } } };In
app.py, add to response mode dropdownUse in engine logic via
getSelectedResponseMode()
π Recent Refactoring (Feb 2026)
Code consolidation to improve maintainability:
- Consolidated getter functions: Single
getSelectedPreset()replaces 3 similar functions - Unified event listeners: Loop-based pattern for select controls (runtime, style, mode, length)
- Extracted helper functions:
resetAllNotesAndVisuals()replaces 3 duplicated blocks - Result: Reduced redundancy, easier to modify preset logic, consistent patterns
β‘ Benchmarking
benchmark.py measures Godzilla model generation speed across all combinations of input length and generation length, with CPU and GPU compared side by side.
What it tests
| Axis | Values |
|---|---|
| Input length | Short (8 notes, ~4 s) Β· Long (90 notes, ~18 s) |
| Generation length | 32 Β· 64 Β· 96 Β· 128 tokens (matches the four UI presets) |
| Devices | CPU always Β· CUDA if available |
Each combination runs a warm-up pass (model load, timing discarded) followed by --runs timed passes. The summary tables report mean, std, min, max in both ms and seconds, plus tokens/sec and GPU speedup.
Usage
# Full sweep β CPU + GPU (if available), 5 runs per combination
uv run python benchmark.py
# CPU only (useful for verifying the script or on CPU-only machines)
uv run python benchmark.py --cpu-only
# Increase runs for tighter statistics
uv run python benchmark.py --runs 10
# Multi-candidate generation (higher quality, slower)
uv run python benchmark.py --candidates 3
Results are printed to stdout and saved to benchmark_results.txt (override with --output).
Example output
============================================================
Device: CUDA | candidates=1
============================================================
[warm-up] loading model + first inference...
input=short (8 notes, ~4s) gen= 32 tokens [1:85ms] [2:82ms] ...
...
================================================================================
SUMMARY β CUDA | candidates=1
================================================================================
Input Gen tok Mean ms Mean s Std ms Min ms Max ms tok/s
-----------------------------------------------------------------------------------------
short (8 notes, ~4s) 32 85 0.09 2.1 82 89 376.5
short (8 notes, ~4s) 128 290 0.29 4.3 284 297 441.4
long (90 notes, ~18s) 32 91 0.09 1.8 88 94 351.6
long (90 notes, ~18s) 128 305 0.31 3.9 299 312 419.7
π οΈ Development Tips
Debugging
- Terminal in UI: Shows all MIDI events and engine responses
- Browser console:
F12for JavaScript errors - Python terminal: Check server-side logs for model loading, inference errors
Testing New Engines
- Record a simple 3-5 note progression
- Play back with different engines
- Check terminal for processing details
- Verify output notes are in valid range (0-127)
Performance
- Recording: Event capture happens in JavaScript (fast, local)
- Processing: May take 2-5 seconds depending on engine and model
- Playback: Tone.js synthesis is real-time (instant)
π§ Technology Stack
- Frontend: Tone.js v6+ (Web Audio API)
- Backend: Gradio 5.49.1 + Python 3.10+
- MIDI: mido library
- Model: Godzilla Piano Transformer (via Hugging Face)
π License
Open source - free to use and modify.