puck / BACKLOG.md
vu1n's picture
Puck β€” desktop fairy familiar (HF Build Small)
3c124f3
|
Raw
History Blame Contribute Delete
15.2 kB

Puck backlog

Ideas not yet built. Each entry: what, why, and where it slots into the architecture (engine = pure policy/data, ui = presentation, modes/sim = orchestration).

Gestures β€” sprite dances/reactions keyed to event type

What: Puck performs a short gesture matched to what he's reporting, not just flying + speaking. A build passing β†’ a little hop/spin of pride. Tests failing or a permission block β†’ an urgent shake/alarm wiggle. Discord noise he ignored β†’ a dismissive shrug. Claude finished β†’ a satisfied stretch. Stale tab β†’ a curious head-tilt. Sleep β†’ a yawn. The gesture plays during the flight-to + bubble beat.

Why: On a busy real desktop the sprite is glanceable; a gesture lets you read the kind of news pre-attentively, before the words. It's the single highest charm-per-line addition left, and it makes the overlay feel like a creature rather than a notifier.

Where it slots in:

  • engine/ β€” a pure gestureFor(event | decision) map: EventDef.id/source β†’ gesture name. Data, not logic; pin a couple of mappings with a test. Could also key off the existing tier/mischief so high-mischief gestures are more theatrical.
  • ui/Sprite.tsx β€” a gesture?: GestureName prop adding a CSS class (gesture-hop, gesture-shake, gesture-shrug, …); animations on .puck-bob/.puck-face only (compositor-friendly transforms, like the existing flap/bob). Clear the class on animationend so it re-triggers.
  • modes/sim/SimApp.tsx β€” set the gesture alongside the existing flyTo/setBubble in fireEvent; clears when the surface resolves.

Notes: keep gestures to transform/opacity so they stay GPU-composited (the whole sprite layer is). Reuse the per-form structure in SpriteBody β€” gestures should read on all four forms (mossling/wisp/gremlin/moth), so animate the wrapper, not form-specific parts. Pairs naturally with the existing mood tint + alert ring.

Color pulse (same feature, cheapest channel): transient-tint the alert ring + glow to a per-event-type hue while a surface is pending β€” red/urgent for failures & permission blocks, gold/success for completions, grey/dim for ignored noise. The ring and .puck-glow already read CSS vars (--accent, --glow), so this is a single --alert-hue override set from the same engine map that picks the gesture β€” one { gesture, hue } lookup feeds both. Even shipping the color pulse alone (before gesture animation work) is a real readability win. Don't fight the existing mood tint (--puck-body etc.): pulse the ring/glow, leave the body color to mood, so "what kind of news" (hue) and "how Puck feels lately" (mood) stay separate signals.

Ambient quips β€” Puck reacts to what you're doing

What: Occasionally Puck mutters a quip or non-sequitur riffing on your current context β€” the active app, a window title, the shape of what you're typing. Not help, not a summary: flavor. "Still in the auth thicket, I see." / "That's a lot of tabs for one small human." / a non-sequitur about the dock breathing. Low frequency, skippable, never twice about the same thing.

Why: This is the difference between a notifier that lives in the corner and a familiar that shares the room. It's the most "alive" feature on the list β€” and the most dangerous, so it ships last and most carefully.

Privacy is the feature, not a caveat (design doc Β§17.3–17.4 are the law here):

  • Local inference ONLY β€” hard gate. Context never leaves the machine. If the brain is the cloud path (Modal/ZeroGPU), this feature is disabled, full stop, not degraded. Enforce in code: the quip path checks the resolved brain is localhost and refuses otherwise β€” a loud guard, not a setting.
  • Opt-in, off by default. A distinct toggle from notifications; the permission copy says plainly what's read and that it stays local.
  • Ephemeral, never persisted, never a trace, never training data. Context for a quip is read, used for one generation, dropped. It must not touch the memory garden, the trace export, or localStorage.
  • Redact before the model sees it. Strip obvious secrets (password fields, token-shaped strings, anything in a field marked sensitive). Prefer abstractions over content β€” "a long terminal command", "a code file", "a messaging app" β€” over the literal text. Quoting back verbatim is the creepy line; stay on the abstract side of it.
  • Never about people or private content. App categories and your own activity, yes; the contents of a DM, an email body, a name on screen β€” no.

Where it slots in:

  • The desktop watcher (future, Β§9.2: NSWorkspace frontmost app, optional AX/screen) is the context source β€” same daemon /events path, a new low-priority context event kind, or a separate local-only endpoint that never queues.
  • engine decides whether to quip (rare; respects annoyance budget / presence, reuses the interruption-taste machinery so an annoying quip trains Puck quieter).
  • The quip generation uses the local brain with a tight "one playful aside, ≀12 words, never quote the user" prompt; surfaces through the existing bubble channel.

Ship order: after the desktop watcher exists and after a privacy pass. Until then it's sim-only flavor at most (riffing on the fake desktop, where there's nothing real to leak).

Phototaxis β€” a fairy drawn to stimuli (whimsical wander)

What: Puck should behave like a moth/fairy β€” pulled toward activity rather than drifting at random. Flits toward motion, lingers near what's lively, chases the occasional shiny thing, then loses interest.

Why: The wander is the sprite's resting personality β€” it's on screen far more than any bubble. Uniform-random reads as a screensaver; attraction reads as alive and curious. Highest charm-per-effort of the ambient ideas.

Buildable now β€” zero new permissions (do this part first): Replace the uniform-random wander target in SimApp with a weighted pull toward salient points we already have:

  • the cursor (occasional gentle follow / curious approach, then retreat β€” never clingy; respects presence),
  • the last event location (he lingers where something just happened),
  • in the overlay, the focused window rect (future: NSWorkspace frontmost-app position via the daemon β€” he hangs near where you're working, patrols where you're not). Keep it a gradient, not a leash: weighted-random pick among attractors + noise, so it stays unpredictable. Lives in the engine as a pure pickWanderTarget(attractors, rng); the loop already exists. Tune against presence (low = aloof, high = follows the action).

Sensor-gated β€” backlog, same privacy rules as ambient quips (local-only, ephemeral):

  • Screen color / motion / "flashing lights": needs ScreenCaptureKit β€” the transparent overlay can't see what's beneath it. A coarse, downsampled brightness/motion map (NOT readable content) could let him drift toward an area that just changed (a video started, a notification flashed). Local-only, never stored, abstractions not pixels.
  • Audio reactivity: mic is a hard no by default; "system audio is playing / its level" is lighter but still opt-in + local-only. A bass-thump bob or a turn-toward-the-sound would be delightful but ships last, behind the same gate as quips.

Smell test for all of it: attraction must stay cute, never surveillant. He reacts to the shape of activity (something moved, something's loud), never to its content.

Take-me-there β€” Puck knows where the activity is, and ferries you to it

What: On a real notification, Puck should point you at the actual window that needs you β€” fly to it if it's on this Space, beckon "follow me" if it's on another β€” and clicking him navigates there (focuses the app, macOS brings its Space forward).

Current behavior (the gap): wire events arrive with target: null (the sim's targets were fake windows), so in the overlay fireEvent flies Puck to a random screen point. He has no idea where the source app lives β€” we never gave him real-window awareness. This is the design doc's Phase 4 ("patrol the desktops you abandoned" presumes knowing where they are).

Phase 1 β€” click-to-activate (high value, no Space geometry needed):

  • Event carries a locator: source app bundle id / pid / window title. The Claude hook already has TERM_PROGRAM, cwd, and the calling pid available; puck-run knows its terminal. Add an optional locator to the wire schema ({bundleId?, pid?, title?}).
  • Rust command activate_target(locator) -> NSRunningApplication(bundleIdentifier:).activate (or AX focus by pid/title). macOS switches to that app's Space automatically.
  • Frontend: clicking Puck while a located surface is pending calls it. This alone delivers "Puck lit up -> click -> you're where the thing is," across Spaces, without knowing which.

Phase 2 β€” same-Space vs other-Space (the fiddly bit):

  • Determine if the target window is on the current Space: CGWindowListCopyWindowInfo with kCGWindowListOptionOnScreenOnly lists on-screen windows; absent target -> elsewhere. Robust Space identity needs the semi-private CGSSpace APIs β€” fragile, optional.
  • Same Space -> fly to the window's screen rect (window bounds from CGWindowList). Other Space -> a "come hither" beckon toward the screen edge (pairs with the gesture entry), and the click teleports.

Notes: locator is metadata, not content β€” bundle id + window title, never window contents (anti-creep). Activation is an explicit user click (navigation, not an autonomous action), so it stays inside the safety tiers. Depends on: gesture vocabulary (beckon) and the future native window watcher (NSWorkspace frontmost / CGWindowList).

Camouflage β€” Puck adapts to what he's floating over

What: When Puck drifts over text or busy content, he reacts to his surroundings like a chameleon/glass-wisp β€” goes translucent, refracts, or (dream version) "mirrors" the texture behind him onto his own body. Blends, then pops back when he moves to empty space.

Why: Sells "he's really in your desktop, not pasted on top." A creature that responds to its background reads as inhabiting the space. Pairs with the wisp form (already glass-like).

Cheap / free now (no sensing):

  • Shy fade: lower sprite opacity while stationary over the busy center of the screen, restore when wandering to the margins β€” pure CSS/opacity on the existing wander state. Reads as "blending in" without literally seeing anything.
  • Refraction (maybe free, needs a WKWebView test): a backdrop-filter glass body on the sprite. In the transparent overlay this might sample the real desktop behind the window (same open question as the speech-bubble blur) β€” if it does, a refractive/distort body gives instant chameleon shimmer with zero screen-capture. Test in the Tauri overlay before committing to it.

Dream version β€” literal mirror (screen-capture gated):

  • Sample the screen region directly under the sprite (ScreenCaptureKit), downsample, and paint it onto his body as living camouflage. Stunning, but it's the same hard gate as ambient quips / phototaxis sensors: local-only, ephemeral, opt-in, never stored, never leaves the machine. Texture/color only β€” never treated as readable content.

Where it slots in: ui/Sprite.tsx (a camouflage intensity prop on .puck-body), driven by modes/sim from sprite position vs. screen regions. The shy-fade is a 20-minute add; refraction is a test-then-maybe; the literal mirror waits for the screen watcher + privacy pass.

Real app icons β€” known apps, OS-extracted where possible

What: Replace the glyph characters (✳ ◍ βœ‰ …) in the sim windows/dock and feed source-markers with real app icons β€” Claude, ChatGPT, Chrome, Gmail, Mail, Discord, Terminal, etc. Looks dramatically more legit, especially in the overlay.

Pulling the actual OS icon (the good version):

  • macOS: NSWorkspace.shared.icon(forFile: "/Applications/Foo.app") returns the real icon for any installed .app bundle β€” a Rust/Tauri command can extract β†’ PNG β†’ hand to the webview. So Chrome, Mail, Discord, Slack, the host terminal: real icons, free, always current.
  • The limit you called: a terminal binary (claude, codex) has no bundle and no icon. Fall back to the host terminal's icon (iTerm/Terminal/Ghostty/Warp β€” which the Claude hook can report via TERM_PROGRAM), or a curated Puck-styled glyph.
  • Web apps with no native app (ChatGPT, Gmail as a tab): no bundle to extract from β€” these need a small curated bundled icon set (a dozen SVGs/PNGs).

So: hybrid. OS extraction where a bundle exists (Rust command, overlay only), curated bundled set for web-apps + terminal-binary fallbacks (works everywhere incl. the Space/sim).

Where it slots in:

  • ui/Desktop.tsx (WIN_DEFS icons, dock glyphs) and the feed source marker β€” currently single glyph chars; swap for an <Icon source=…> that prefers OS-extracted, falls back to bundled, falls back to glyph.
  • Overlay-only Rust command app_icon(bundleId|path) -> png for the extraction half.
  • Engine SOURCES/EventDef already carry source identity; add an optional bundleId hint.

Cheap first step (no native): ship the curated bundled icon set + source→icon map; use glyph only as last resort. The OS-extraction half is an overlay enhancement on top.

Local vision β€” the private, free path for continuous perception

What: Run Puck's eyes on-device instead of (or alongside) Modal, so real-screen perception is private and continuous vision costs ~nothing. Brain seam already supports it: point PUCK_VISION_URL at a local OpenAI-compatible server.

Capability is NOT the blocker (verified 2026-06-07): screen-reading is OCR + light "what's notable" reasoning β€” small VLMs excel at it. Options:

  • Holotron-12B local via llama.cpp + mmproj β€” llama.cpp merged Nemotron-Nano-12B-v2-VL (PR #19547). convert_hf_to_gguf.py --mmproj β†’ vision projector β†’ llama-server. Same model as cloud, same capability, ~24GB on the 48GB Mac, free. NB: Ollama can't load the mmproj β€” must use raw llama-server. Keeps the Nemotron Quest tie.
  • Qwen2.5-VL-7B local β€” ~6GB, 95.7 DocVQA, fast; ideal for continuous ambient (every 45s forever on the M4 Max). Loses the Nemotron tie, plenty for screen-reading.
  • MiniCPM-V 2.6 (~5.5GB), Moondream2 (1.9B, CPU) as even-lighter fallbacks.

Recommendation: Holotron for the showcase + Quest (cloud now β†’ local llama-server later as the private path); Qwen2.5-VL-7B-local as the cheap continuous engine. The visionMode "Continuous" tier (built, currently same cloud path) should switch to a local PUCK_VISION_URL when this lands.

Where it slots in: zero engine/frontend change β€” it's a runtime: spin up llama-server --mmproj (or vLLM/mlx when fixed) and set PUCK_VISION_URL. Plus the real-screen capture (ScreenCaptureKit, overlay) to feed it actual pixels instead of the sim snapshot.