Spaces:

FJFehr
/

virtual_keyboard

Running

App Files Files Community

virtual_keyboard / README.md

FJFehr

feat: add CPU/GPU generation benchmark script

29dbf34 8 days ago

preview code

raw

history blame contribute delete

8.55 kB

A newer version of the Gradio SDK is available: 6.8.0

Upgrade

metadata

title: SYNTHIA
emoji: 🎹
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Browser-based MIDI keyboard with recording and synthesis

SYNTHIA

Play, record, and let AI continue your musical phrases in real-time. 🎹

� Quick Start

# Install dependencies
uv sync

# Run the app
uv run python app.py

Open http://127.0.0.1:7860

🏗️ Architecture Overview

SYNTHIA is a browser-based MIDI keyboard with three main layers:

Backend (Python/Gradio): Configuration, MIDI engines, model loading
Frontend (JavaScript/Tone.js): Audio synthesis, keyboard rendering, event handling
Communication: Gradio bridge for sending recorded MIDI to backend for processing

Data Flow

User plays keyboard
    ↓
JavaScript captures MIDI events → records to array
    ↓
User clicks "Play/Process"
    ↓
Backend engine processes recorded events
    ↓
Result returned as MIDI events
    ↓
JavaScript plays result through Tone.js synth

📂 File Responsibilities

Backend Files

File	Purpose
app.py	Gradio app setup, UI layout, instrument definitions, API endpoints
config.py	Global settings (audio parameters, model paths, inference defaults)
engines.py	Three MIDI processing engines: `parrot` (repeat), `reverse_parrot` (reverse), `godzilla_continue` (AI generation)
midi_model.py	Godzilla model loading, tokenization, inference
midi.py	MIDI file utilities (encode/decode, cleanup, utilities)

Frontend Files

File	Purpose
keyboard.html	DOM structure (keyboard grid, controls, terminal)
keyboard.js	Main application logic: keyboard rendering, audio synthesis (Tone.js), recording, UI event binding, engine communication
styles.css	Styling and animations

Configuration & Dependencies

File	Purpose
requirements.txt	Python dependencies
pyproject.toml	Project metadata

🎹 Core Functionality

Keyboard Controls

Click keys or press computer keys to play notes
Record button: Capture MIDI events from keyboard
Play button: Play back recorded events
Save button: Download recording as .mid file
Game mode: Take turns with AI completing phrases

MIDI Engines

Parrot: Repeats your exact melody
Reverse Parrot: Plays melody backward
Godzilla: AI generates musical continuations using transformer model

UI Features

Engine selector: Choose processing method
Style selector: AI style (melodic, energetic, ornate, etc.)
Response mode: Control AI generation behavior
Runtime selector: GPU (fast) vs CPU (reliable)
Instrument selector: Change synth sound
AI voice selector: Change AI synth sound
Terminal: Real-time event logging

🔧 How to Add New Functionality

Adding a New MIDI Engine

In engines.py, add a new function:

def my_new_engine(events, options):
    # Process MIDI events
    return processed_events

In app.py, register the engine in process_events():

elif engine == 'my_engine':
    result_events = my_new_engine(events, options)

In app.py, add to engine dropdown:

with gr.Group(label="Engine"):
    engine = gr.Dropdown(
        choices=['parrot', 'reverse_parrot', 'godzilla_continue', 'my_engine'],
        # ...
    )

In keyboard.js, add tooltip (line ~215 in populateEngineSelect()):

const engineTooltips = {
    'my_engine': 'Description of what your engine does'
};

Adding a New Control Selector

In app.py, create the selector in the UI:

my_control = gr.Dropdown(
    choices=['option1', 'option2'],
    label="My Control",
    value='option1'
)

In keyboard.js (line ~1510), add to selectControls array:

{
    element: myControlSelect,
    getter: () => ({ label: myControlSelect.value }),
    message: (result) => `Control switched to: ${result.label}`
}

In keyboard.js, pass control to engine via processEventsThroughEngine():

const engineOptions = {
    my_control: document.getElementById('myControl').value,
    // ... other options
};

Adding a New Response Mode

In keyboard.js (line ~175), add preset definition:

const RESPONSE_MODES = {
    'my_mode': {
        label: 'My Mode',
        processFunction: (events) => {
            // Processing logic
            return processedEvents;
        }
    }
};

In app.py, add to response mode dropdown
Use in engine logic via getSelectedResponseMode()

🔄 Recent Refactoring (Feb 2026)

Code consolidation to improve maintainability:

Consolidated getter functions: Single getSelectedPreset() replaces 3 similar functions
Unified event listeners: Loop-based pattern for select controls (runtime, style, mode, length)
Extracted helper functions: resetAllNotesAndVisuals() replaces 3 duplicated blocks
Result: Reduced redundancy, easier to modify preset logic, consistent patterns

⚡ Benchmarking

benchmark.py measures Godzilla model generation speed across all combinations of input length and generation length, with CPU and GPU compared side by side.

What it tests

Axis	Values
Input length	Short (8 notes, ~4 s) · Long (90 notes, ~18 s)
Generation length	32 · 64 · 96 · 128 tokens (matches the four UI presets)
Devices	CPU always · CUDA if available

Each combination runs a warm-up pass (model load, timing discarded) followed by --runs timed passes. The summary tables report mean, std, min, max in both ms and seconds, plus tokens/sec and GPU speedup.

Usage

# Full sweep — CPU + GPU (if available), 5 runs per combination
uv run python benchmark.py

# CPU only (useful for verifying the script or on CPU-only machines)
uv run python benchmark.py --cpu-only

# Increase runs for tighter statistics
uv run python benchmark.py --runs 10

# Multi-candidate generation (higher quality, slower)
uv run python benchmark.py --candidates 3

Results are printed to stdout and saved to benchmark_results.txt (override with --output).

Example output

============================================================
  Device: CUDA  |  candidates=1
============================================================
  [warm-up] loading model + first inference...
  input=short (8 notes, ~4s)   gen= 32 tokens  [1:85ms] [2:82ms] ...
  ...

================================================================================
  SUMMARY — CUDA  |  candidates=1
================================================================================
  Input                     Gen tok   Mean ms    Mean s   Std ms   Min ms   Max ms   tok/s
  -----------------------------------------------------------------------------------------
  short (8 notes, ~4s)           32        85      0.09      2.1       82       89   376.5
  short (8 notes, ~4s)          128       290      0.29      4.3      284      297   441.4
  long  (90 notes, ~18s)         32        91      0.09      1.8       88       94   351.6
  long  (90 notes, ~18s)        128       305      0.31      3.9      299      312   419.7

🛠️ Development Tips

Debugging

Terminal in UI: Shows all MIDI events and engine responses
Browser console: F12 for JavaScript errors
Python terminal: Check server-side logs for model loading, inference errors

Testing New Engines

Record a simple 3-5 note progression
Play back with different engines
Check terminal for processing details
Verify output notes are in valid range (0-127)

Performance

Recording: Event capture happens in JavaScript (fast, local)
Processing: May take 2-5 seconds depending on engine and model
Playback: Tone.js synthesis is real-time (instant)

🔧 Technology Stack

Frontend: Tone.js v6+ (Web Audio API)
Backend: Gradio 5.49.1 + Python 3.10+
MIDI: mido library
Model: Godzilla Piano Transformer (via Hugging Face)

📝 License

Open source - free to use and modify.