Ashiedu's picture
Sync unified workbench
0490201 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

T-006: synesthesia-core Crate Scaffold β€” Shared Constants, Types, Module Structure

Type: Task
Phase: 0 β€” Foundation
Autonomy: agent:autonomous β€” Pure Rust library, no external systems, no decisions required.
Stack: stack:rust
Version: v0.1
Iteration: iter-1
Effort: XS (1 hour)


⚠️ Agent Scope: Populate crates/synesthesia-core/ with the canonical shared constants, a handful of foundational types, and stub module declarations. Do not implement any audio processing, model inference, or IPC logic here β€” those live in kansas/ and plugin/. This crate is a library. Its job in v0.1 is to be the single source of truth for values that both targets must agree on.


Context

T-001 created crates/synesthesia-core/src/lib.rs as a comment-only stub. T-005 added it as a dep to plugin/. T-006 gives it actual content.

The core problem this crate solves: kansas/ and plugin/ are separate compilation units. If kansas/ defines SAMPLE_RATE = 48_000 locally and plugin/ defines it separately, they will inevitably diverge. synesthesia-core is the single place these constants live β€” both crates import from here, so a change propagates everywhere.

Beyond constants, this crate will eventually hold the generator traits, shared note event types, and latent vector types that the plugin's process() loop and the Tauri app's model inference loop both need to speak. T-006 lays the module skeleton so future tasks know where to add their types.


Prerequisites

  • T-001 merged β€” crates/synesthesia-core/ exists with empty lib.rs
  • cargo check --workspace passes before starting

Acceptance Criteria

  • cargo check --workspace β€” zero errors
  • cargo clippy --workspace -- -D warnings β€” zero warnings
  • SAMPLE_RATE, CHANNELS, PPQN are publicly exported from synesthesia_core::audio
  • NoteEvent, PitchClass are publicly exported from synesthesia_core::music
  • LatentVector, MoodDescriptor are publicly exported from synesthesia_core::models
  • All types derive Debug, Clone, PartialEq; numeric newtype wrappers also derive Copy
  • pub use re-exports at crate root mean synesthesia_core::SAMPLE_RATE works without importing submodules
  • No #[allow(dead_code)] suppressions β€” if something is unused, the doc comment explains what T-NNN will use it
  • kansas/src/audio/engine.rs local SAMPLE_RATE and CHANNELS constants are replaced with imports from this crate

crates/synesthesia-core/Cargo.toml

[package]
name    = "synesthesia-core"
version = "0.1.0"
edition.workspace = true

[dependencies]
# T-006 has no runtime deps β€” pure Rust types and constants only.
# serde added here when T-046+ needs to serialize model inputs across the IPC boundary.

File Structure After This Task

crates/synesthesia-core/src/
β”œβ”€β”€ lib.rs       ← module declarations + pub use re-exports
β”œβ”€β”€ audio.rs     ← SAMPLE_RATE, CHANNELS, PPQN; AudioConstants
β”œβ”€β”€ music.rs     ← NoteEvent, PitchClass, PITCH_CLASS_NAMES, pitch_to_freq()
└── models.rs    ← LatentVector, MoodDescriptor, GenerationMode

Implementation

crates/synesthesia-core/src/lib.rs

//! # synesthesia-core
//!
//! Shared constants, types, and traits used by both the Tauri standalone app
//! (`kansas/`) and the nih-plug VST3/CLAP plugin (`plugin/`).
//!
//! ## Module layout
//!
//! - [`audio`]  β€” sample rate, channel count, buffer constants
//! - [`music`]  β€” note events, pitch classes, music theory primitives
//! - [`models`] β€” shared types for ML model inputs and outputs

pub mod audio;
pub mod music;
pub mod models;

// ── Convenience re-exports from crate root ────────────────────────────────────
// Allows `synesthesia_core::SAMPLE_RATE` without importing submodules.

pub use audio::{CHANNELS, PPQN, SAMPLE_RATE};
pub use models::{GenerationMode, LatentVector, MoodDescriptor};
pub use music::{PitchClass, NoteEvent, PITCH_CLASS_NAMES};

crates/synesthesia-core/src/audio.rs

//! Audio system constants shared between the standalone app and the plugin.
//!
//! These values are the single source of truth. Do not redeclare them locally
//! in `kansas/` or `plugin/` β€” import from here.

/// System sample rate in Hz. Chosen to match Magenta RT's SpectroStream codec (48 kHz stereo).
/// All CPAL streams (T-004) and nih-plug processing (T-005+) target this rate.
pub const SAMPLE_RATE: u32 = 48_000;

/// Number of audio channels. Stereo throughout the system.
pub const CHANNELS: u16 = 2;

/// Pulses per quarter note β€” MIDI standard resolution used by the transport clock (T-003)
/// and Perf RNN tick scheduling (T-063).
pub const PPQN: u32 = 96;

/// Default audio output buffer size in frames at SAMPLE_RATE.
/// 512 frames @ 48 kHz β‰ˆ 10.67 ms. The VST plugin inherits the DAW's buffer size instead.
pub const DEFAULT_BUFFER_FRAMES: u32 = 512;

/// Minimum and maximum supported BPM for the transport clock.
pub const BPM_MIN: f32 = 20.0;
pub const BPM_MAX: f32 = 300.0;

/// Compute tick interval in microseconds for a given BPM.
///
/// # Example
/// ```
/// use synesthesia_core::audio::tick_interval_us;
/// let us = tick_interval_us(128.0); // β‰ˆ 4883 ΞΌs at 128 BPM
/// assert!((us - 4882.8).abs() < 1.0);
/// ```
#[inline]
pub fn tick_interval_us(bpm: f32) -> f32 {
    60_000_000.0 / bpm / PPQN as f32
}

crates/synesthesia-core/src/music.rs

//! Music theory primitives shared between the generative engines and the UI.

/// A single MIDI-style note event produced by Performance RNN (T-062) and
/// consumed by the MIDI output (T-068) and nih-plug plugin (T-069).
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct NoteEvent {
    /// MIDI pitch 0–127. Middle C = 60.
    pub pitch: u8,
    /// MIDI velocity 1–127. 0 = note-off.
    pub velocity: u8,
    /// MIDI channel 0–15.
    pub channel: u8,
    /// Position in transport ticks (PPQN resolution) from session start.
    pub tick: u64,
    /// Duration in ticks. 0 = instantaneous (gate closed immediately).
    pub duration_ticks: u32,
}

impl NoteEvent {
    /// True if this is a note-off event (velocity == 0).
    #[inline]
    pub fn is_note_off(&self) -> bool {
        self.velocity == 0
    }

    /// Convert pitch to frequency in Hz using equal temperament (A4 = 440 Hz).
    #[inline]
    pub fn frequency(&self) -> f32 {
        pitch_to_freq(self.pitch)
    }
}

/// The 12 pitch classes of the chromatic scale.
/// Used by the 12-dimensional pitch intelligence radar (UI, T-058)
/// and Steam Audio spatial positioning (T-100).
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord)]
#[repr(u8)]
pub enum PitchClass {
    C = 0, Cs, D, Ds, E, F, Fs, G, Gs, A, As, B,
}

impl PitchClass {
    /// Extract pitch class from a MIDI pitch number.
    #[inline]
    pub fn from_midi(pitch: u8) -> Self {
        match pitch % 12 {
            0  => Self::C,  1  => Self::Cs, 2  => Self::D,
            3  => Self::Ds, 4  => Self::E,  5  => Self::F,
            6  => Self::Fs, 7  => Self::G,  8  => Self::Gs,
            9  => Self::A,  10 => Self::As, _  => Self::B,
        }
    }

    /// Index 0–11.
    #[inline]
    pub fn index(self) -> usize {
        self as usize
    }
}

/// Human-readable pitch class names aligned with `PitchClass` index.
/// `PITCH_CLASS_NAMES[PitchClass::Fs as usize]` == `"F#"`.
pub const PITCH_CLASS_NAMES: [&str; 12] = [
    "C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B",
];

/// Convert MIDI pitch to frequency in Hz (equal temperament, A4 = 440 Hz).
#[inline]
pub fn pitch_to_freq(pitch: u8) -> f32 {
    440.0 * 2.0_f32.powf((pitch as f32 - 69.0) / 12.0)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn middle_c_freq() {
        let f = pitch_to_freq(60);
        assert!((f - 261.63).abs() < 0.1, "Middle C should be ~261.63 Hz, got {f}");
    }

    #[test]
    fn a4_freq() {
        let f = pitch_to_freq(69);
        assert!((f - 440.0).abs() < 0.01, "A4 should be 440 Hz, got {f}");
    }

    #[test]
    fn pitch_class_from_midi() {
        assert_eq!(PitchClass::from_midi(60), PitchClass::C);
        assert_eq!(PitchClass::from_midi(69), PitchClass::A);
        assert_eq!(PitchClass::from_midi(127), PitchClass::G);
    }
}

crates/synesthesia-core/src/models.rs

//! Shared types for ML model inputs and outputs.
//!
//! These are the Rust-native equivalents of the IPC types in `kansas/src/ipc/types.rs`.
//! The IPC types are serialized to JSON for the Tauri frontend.
//! These types are used directly in Rust β€” no serialization overhead.

/// MusicVAE 512-dimensional latent vector.
/// Produced by the encoder (T-054) and consumed by the decoder (T-055)
/// and Track Mixer latent navigation (T-057).
#[derive(Debug, Clone, PartialEq)]
pub struct LatentVector(pub [f32; 512]);

impl LatentVector {
    /// The zero vector β€” corresponds to the VAE prior mean (silence/root).
    pub fn zero() -> Self {
        Self([0.0; 512])
    }

    /// Linear interpolation between two latent vectors.
    /// `t = 0.0` returns `self`, `t = 1.0` returns `other`.
    pub fn lerp(&self, other: &Self, t: f32) -> Self {
        let t = t.clamp(0.0, 1.0);
        let mut out = [0.0f32; 512];
        for i in 0..512 {
            out[i] = self.0[i] * (1.0 - t) + other.0[i] * t;
        }
        Self(out)
    }
}

/// Semantic mood state output from Gemma-3N visual analysis (T-052).
/// Rust-native mirror of `MoodState` in `kansas/src/ipc/types.rs`.
#[derive(Debug, Clone, PartialEq)]
pub struct MoodDescriptor {
    /// Natural language mood label e.g. "mysterious", "energetic", "serene".
    pub mood: String,
    /// Energy level 0.0–1.0.
    pub energy: f32,
    /// Timbral texture label e.g. "dense", "sparse", "glitchy".
    pub texture: String,
}

impl Default for MoodDescriptor {
    fn default() -> Self {
        Self {
            mood: "neutral".into(),
            energy: 0.5,
            texture: "balanced".into(),
        }
    }
}

/// Which generation engine is active on a channel.
/// Used by the plugin's process() loop and the standalone app's mix bus router.
/// T-032 (Perf RNN VST), T-065 (standalone), T-089 (Magenta RT) set this per channel.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub enum GenerationMode {
    /// No generation β€” channel produces silence.
    #[default]
    Silent,
    /// Performance RNN MIDI generation (T-062).
    PerfRnn,
    /// Magenta RT continuous audio generation (T-081).
    MagentaRt,
    /// DDSP timbre transfer on Perf RNN MIDI output (T-092).
    Ddsp,
    /// Audio passthrough β€” used by the T-005 scaffold and test paths.
    Passthrough,
}

Update kansas/src/audio/engine.rs

Once T-006 is merged, replace the local constants in engine.rs (added by T-004) with imports:

// In kansas/src/audio/engine.rs β€” REPLACE these local definitions:
// const SAMPLE_RATE: u32 = 48_000;
// const CHANNELS: u16 = 2;
// const BUFFER_FRAMES: u32 = 512;

// WITH:
use synesthesia_core::{SAMPLE_RATE, CHANNELS};
use synesthesia_core::audio::{DEFAULT_BUFFER_FRAMES as BUFFER_FRAMES};

If T-004 and T-006 are merged in the same PR or T-006 merges first, apply this change directly. If T-004 merges before T-006, open a follow-up commit on main after T-006 merges.

Similarly, replace the local PPQN constant in kansas/src/commands/transport.rs (T-003):

// REPLACE:
// const PPQN: u32 = 96;

// WITH:
use synesthesia_core::PPQN;

Testing

Automated

  • cargo check --workspace β€” zero errors
  • cargo clippy --workspace -- -D warnings β€” zero warnings
  • cargo test -p synesthesia-core β€” middle_c_freq, a4_freq, pitch_class_from_midi all pass

Manual

# Verify both consumers compile against it
cargo check -p kansas
cargo check -p synesthesia-plugin

GitHub CLI

gh issue create \
  --title "T-006: synesthesia-core crate scaffold β€” shared constants, types, module structure" \
  --label "type:task,stack:rust,agent:autonomous,priority:high,status:ready,day:1" \
  --body-file T-006.md

Parent: GENESIS
Blocks: T-003 constant migration (PPQN), T-004 constant migration (SAMPLE_RATE, CHANNELS), T-057 (LatentVector lerp), T-062 (NoteEvent from Perf RNN), T-068 (NoteEvent to MIDI out), T-100 (PitchClass icosahedron positions)
Blocked By: T-001
Version: v0.1 Β· Iteration: iter-1 Β· Effort: XS (1 hour)