virtual_keyboard / README.md
FJFehr's picture
feat: add CPU/GPU generation benchmark script
29dbf34
---
title: SYNTHIA
emoji: 🎹
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Browser-based MIDI keyboard with recording and synthesis
---
# SYNTHIA
Play, record, and let AI continue your musical phrases in real-time. 🎹
## οΏ½ Quick Start
```bash
# Install dependencies
uv sync
# Run the app
uv run python app.py
```
Open **http://127.0.0.1:7860**
---
## πŸ—οΈ Architecture Overview
**SYNTHIA** is a browser-based MIDI keyboard with three main layers:
1. **Backend** (Python/Gradio): Configuration, MIDI engines, model loading
2. **Frontend** (JavaScript/Tone.js): Audio synthesis, keyboard rendering, event handling
3. **Communication**: Gradio bridge for sending recorded MIDI to backend for processing
### Data Flow
```
User plays keyboard
↓
JavaScript captures MIDI events β†’ records to array
↓
User clicks "Play/Process"
↓
Backend engine processes recorded events
↓
Result returned as MIDI events
↓
JavaScript plays result through Tone.js synth
```
---
## πŸ“‚ File Responsibilities
### Backend Files
| File | Purpose |
|------|---------|
| **app.py** | Gradio app setup, UI layout, instrument definitions, API endpoints |
| **config.py** | Global settings (audio parameters, model paths, inference defaults) |
| **engines.py** | Three MIDI processing engines: `parrot` (repeat), `reverse_parrot` (reverse), `godzilla_continue` (AI generation) |
| **midi_model.py** | Godzilla model loading, tokenization, inference |
| **midi.py** | MIDI file utilities (encode/decode, cleanup, utilities) |
### Frontend Files
| File | Purpose |
|------|---------|
| **keyboard.html** | DOM structure (keyboard grid, controls, terminal) |
| **keyboard.js** | Main application logic: keyboard rendering, audio synthesis (Tone.js), recording, UI event binding, engine communication |
| **styles.css** | Styling and animations |
### Configuration & Dependencies
| File | Purpose |
|------|---------|
| **requirements.txt** | Python dependencies |
| **pyproject.toml** | Project metadata |
---
## 🎹 Core Functionality
### Keyboard Controls
- **Click keys** or **press computer keys** to play notes
- **Record button**: Capture MIDI events from keyboard
- **Play button**: Play back recorded events
- **Save button**: Download recording as .mid file
- **Game mode**: Take turns with AI completing phrases
### MIDI Engines
1. **Parrot**: Repeats your exact melody
2. **Reverse Parrot**: Plays melody backward
3. **Godzilla**: AI generates musical continuations using transformer model
### UI Features
- **Engine selector**: Choose processing method
- **Style selector**: AI style (melodic, energetic, ornate, etc.)
- **Response mode**: Control AI generation behavior
- **Runtime selector**: GPU (fast) vs CPU (reliable)
- **Instrument selector**: Change synth sound
- **AI voice selector**: Change AI synth sound
- **Terminal**: Real-time event logging
---
## πŸ”§ How to Add New Functionality
### Adding a New MIDI Engine
1. **In `engines.py`**, add a new function:
```python
def my_new_engine(events, options):
# Process MIDI events
return processed_events
```
2. **In `app.py`**, register the engine in `process_events()`:
```python
elif engine == 'my_engine':
result_events = my_new_engine(events, options)
```
3. **In `app.py`**, add to engine dropdown:
```python
with gr.Group(label="Engine"):
engine = gr.Dropdown(
choices=['parrot', 'reverse_parrot', 'godzilla_continue', 'my_engine'],
# ...
)
```
4. **In `keyboard.js`**, add tooltip (line ~215 in `populateEngineSelect()`):
```javascript
const engineTooltips = {
'my_engine': 'Description of what your engine does'
};
```
### Adding a New Control Selector
1. **In `app.py`**, create the selector in the UI:
```python
my_control = gr.Dropdown(
choices=['option1', 'option2'],
label="My Control",
value='option1'
)
```
2. **In `keyboard.js`** (line ~1510), add to `selectControls` array:
```javascript
{
element: myControlSelect,
getter: () => ({ label: myControlSelect.value }),
message: (result) => `Control switched to: ${result.label}`
}
```
3. **In `keyboard.js`**, pass control to engine via `processEventsThroughEngine()`:
```javascript
const engineOptions = {
my_control: document.getElementById('myControl').value,
// ... other options
};
```
### Adding a New Response Mode
1. **In `keyboard.js`** (line ~175), add preset definition:
```javascript
const RESPONSE_MODES = {
'my_mode': {
label: 'My Mode',
processFunction: (events) => {
// Processing logic
return processedEvents;
}
}
};
```
2. **In `app.py`**, add to response mode dropdown
3. **Use in engine logic** via `getSelectedResponseMode()`
---
## πŸ”„ Recent Refactoring (Feb 2026)
Code consolidation to improve maintainability:
- **Consolidated getter functions**: Single `getSelectedPreset()` replaces 3 similar functions
- **Unified event listeners**: Loop-based pattern for select controls (runtime, style, mode, length)
- **Extracted helper functions**: `resetAllNotesAndVisuals()` replaces 3 duplicated blocks
- **Result**: Reduced redundancy, easier to modify preset logic, consistent patterns
---
## ⚑ Benchmarking
`benchmark.py` measures Godzilla model generation speed across all combinations of input length and generation length, with CPU and GPU compared side by side.
### What it tests
| Axis | Values |
|------|--------|
| Input length | Short (8 notes, ~4 s) Β· Long (90 notes, ~18 s) |
| Generation length | 32 Β· 64 Β· 96 Β· 128 tokens (matches the four UI presets) |
| Devices | CPU always Β· CUDA if available |
Each combination runs a warm-up pass (model load, timing discarded) followed by `--runs` timed passes. The summary tables report mean, std, min, max in both ms and seconds, plus tokens/sec and GPU speedup.
### Usage
```bash
# Full sweep β€” CPU + GPU (if available), 5 runs per combination
uv run python benchmark.py
# CPU only (useful for verifying the script or on CPU-only machines)
uv run python benchmark.py --cpu-only
# Increase runs for tighter statistics
uv run python benchmark.py --runs 10
# Multi-candidate generation (higher quality, slower)
uv run python benchmark.py --candidates 3
```
Results are printed to stdout and saved to `benchmark_results.txt` (override with `--output`).
### Example output
```
============================================================
Device: CUDA | candidates=1
============================================================
[warm-up] loading model + first inference...
input=short (8 notes, ~4s) gen= 32 tokens [1:85ms] [2:82ms] ...
...
================================================================================
SUMMARY β€” CUDA | candidates=1
================================================================================
Input Gen tok Mean ms Mean s Std ms Min ms Max ms tok/s
-----------------------------------------------------------------------------------------
short (8 notes, ~4s) 32 85 0.09 2.1 82 89 376.5
short (8 notes, ~4s) 128 290 0.29 4.3 284 297 441.4
long (90 notes, ~18s) 32 91 0.09 1.8 88 94 351.6
long (90 notes, ~18s) 128 305 0.31 3.9 299 312 419.7
```
---
## πŸ› οΈ Development Tips
### Debugging
- **Terminal in UI**: Shows all MIDI events and engine responses
- **Browser console**: `F12` for JavaScript errors
- **Python terminal**: Check server-side logs for model loading, inference errors
### Testing New Engines
1. Record a simple 3-5 note progression
2. Play back with different engines
3. Check terminal for processing details
4. Verify output notes are in valid range (0-127)
### Performance
- **Recording**: Event capture happens in JavaScript (fast, local)
- **Processing**: May take 2-5 seconds depending on engine and model
- **Playback**: Tone.js synthesis is real-time (instant)
---
## πŸ”§ Technology Stack
- **Frontend**: Tone.js v6+ (Web Audio API)
- **Backend**: Gradio 5.49.1 + Python 3.10+
- **MIDI**: mido library
- **Model**: Godzilla Piano Transformer (via Hugging Face)
---
## πŸ“ License
Open source - free to use and modify.