Spaces:
Running
Puck backlog
Ideas not yet built. Each entry: what, why, and where it slots into the architecture (engine = pure policy/data, ui = presentation, modes/sim = orchestration).
Gestures β sprite dances/reactions keyed to event type
What: Puck performs a short gesture matched to what he's reporting, not just flying + speaking. A build passing β a little hop/spin of pride. Tests failing or a permission block β an urgent shake/alarm wiggle. Discord noise he ignored β a dismissive shrug. Claude finished β a satisfied stretch. Stale tab β a curious head-tilt. Sleep β a yawn. The gesture plays during the flight-to + bubble beat.
Why: On a busy real desktop the sprite is glanceable; a gesture lets you read the kind of news pre-attentively, before the words. It's the single highest charm-per-line addition left, and it makes the overlay feel like a creature rather than a notifier.
Where it slots in:
engine/β a puregestureFor(event | decision)map:EventDef.id/sourceβ gesture name. Data, not logic; pin a couple of mappings with a test. Could also key off the existing tier/mischief so high-mischief gestures are more theatrical.ui/Sprite.tsxβ agesture?: GestureNameprop adding a CSS class (gesture-hop,gesture-shake,gesture-shrug, β¦); animations on.puck-bob/.puck-faceonly (compositor-friendly transforms, like the existing flap/bob). Clear the class on animationend so it re-triggers.modes/sim/SimApp.tsxβ set the gesture alongside the existingflyTo/setBubbleinfireEvent; clears when the surface resolves.
Notes: keep gestures to transform/opacity so they stay GPU-composited (the whole
sprite layer is). Reuse the per-form structure in SpriteBody β gestures should read
on all four forms (mossling/wisp/gremlin/moth), so animate the wrapper, not form-specific
parts. Pairs naturally with the existing mood tint + alert ring.
Color pulse (same feature, cheapest channel): transient-tint the alert ring + glow
to a per-event-type hue while a surface is pending β red/urgent for failures &
permission blocks, gold/success for completions, grey/dim for ignored noise. The ring
and .puck-glow already read CSS vars (--accent, --glow), so this is a single
--alert-hue override set from the same engine map that picks the gesture β one
{ gesture, hue } lookup feeds both. Even shipping the color pulse alone (before
gesture animation work) is a real readability win. Don't fight the existing mood tint
(--puck-body etc.): pulse the ring/glow, leave the body color to mood, so "what kind
of news" (hue) and "how Puck feels lately" (mood) stay separate signals.
Ambient quips β Puck reacts to what you're doing
What: Occasionally Puck mutters a quip or non-sequitur riffing on your current context β the active app, a window title, the shape of what you're typing. Not help, not a summary: flavor. "Still in the auth thicket, I see." / "That's a lot of tabs for one small human." / a non-sequitur about the dock breathing. Low frequency, skippable, never twice about the same thing.
Why: This is the difference between a notifier that lives in the corner and a familiar that shares the room. It's the most "alive" feature on the list β and the most dangerous, so it ships last and most carefully.
Privacy is the feature, not a caveat (design doc Β§17.3β17.4 are the law here):
- Local inference ONLY β hard gate. Context never leaves the machine. If the brain is the cloud path (Modal/ZeroGPU), this feature is disabled, full stop, not degraded. Enforce in code: the quip path checks the resolved brain is localhost and refuses otherwise β a loud guard, not a setting.
- Opt-in, off by default. A distinct toggle from notifications; the permission copy says plainly what's read and that it stays local.
- Ephemeral, never persisted, never a trace, never training data. Context for a quip is read, used for one generation, dropped. It must not touch the memory garden, the trace export, or localStorage.
- Redact before the model sees it. Strip obvious secrets (password fields, token-shaped strings, anything in a field marked sensitive). Prefer abstractions over content β "a long terminal command", "a code file", "a messaging app" β over the literal text. Quoting back verbatim is the creepy line; stay on the abstract side of it.
- Never about people or private content. App categories and your own activity, yes; the contents of a DM, an email body, a name on screen β no.
Where it slots in:
- The desktop watcher (future, Β§9.2: NSWorkspace frontmost app, optional AX/screen) is the
context source β same daemon
/eventspath, a new low-prioritycontextevent kind, or a separate local-only endpoint that never queues. enginedecides whether to quip (rare; respects annoyance budget / presence, reuses the interruption-taste machinery so an annoying quip trains Puck quieter).- The quip generation uses the local brain with a tight "one playful aside, β€12 words, never quote the user" prompt; surfaces through the existing bubble channel.
Ship order: after the desktop watcher exists and after a privacy pass. Until then it's sim-only flavor at most (riffing on the fake desktop, where there's nothing real to leak).
Phototaxis β a fairy drawn to stimuli (whimsical wander)
What: Puck should behave like a moth/fairy β pulled toward activity rather than drifting at random. Flits toward motion, lingers near what's lively, chases the occasional shiny thing, then loses interest.
Why: The wander is the sprite's resting personality β it's on screen far more than any bubble. Uniform-random reads as a screensaver; attraction reads as alive and curious. Highest charm-per-effort of the ambient ideas.
Buildable now β zero new permissions (do this part first):
Replace the uniform-random wander target in SimApp with a weighted pull toward salient
points we already have:
- the cursor (occasional gentle follow / curious approach, then retreat β never
clingy; respects
presence), - the last event location (he lingers where something just happened),
- in the overlay, the focused window rect (future: NSWorkspace frontmost-app
position via the daemon β he hangs near where you're working, patrols where you're not).
Keep it a gradient, not a leash: weighted-random pick among attractors + noise, so it
stays unpredictable. Lives in the engine as a pure
pickWanderTarget(attractors, rng); the loop already exists. Tune againstpresence(low = aloof, high = follows the action).
Sensor-gated β backlog, same privacy rules as ambient quips (local-only, ephemeral):
- Screen color / motion / "flashing lights": needs ScreenCaptureKit β the transparent overlay can't see what's beneath it. A coarse, downsampled brightness/motion map (NOT readable content) could let him drift toward an area that just changed (a video started, a notification flashed). Local-only, never stored, abstractions not pixels.
- Audio reactivity: mic is a hard no by default; "system audio is playing / its level" is lighter but still opt-in + local-only. A bass-thump bob or a turn-toward-the-sound would be delightful but ships last, behind the same gate as quips.
Smell test for all of it: attraction must stay cute, never surveillant. He reacts to the shape of activity (something moved, something's loud), never to its content.
Take-me-there β Puck knows where the activity is, and ferries you to it
What: On a real notification, Puck should point you at the actual window that needs you β fly to it if it's on this Space, beckon "follow me" if it's on another β and clicking him navigates there (focuses the app, macOS brings its Space forward).
Current behavior (the gap): wire events arrive with target: null (the sim's targets
were fake windows), so in the overlay fireEvent flies Puck to a random screen point.
He has no idea where the source app lives β we never gave him real-window awareness.
This is the design doc's Phase 4 ("patrol the desktops you abandoned" presumes knowing
where they are).
Phase 1 β click-to-activate (high value, no Space geometry needed):
- Event carries a locator: source app bundle id / pid / window title. The Claude hook
already has
TERM_PROGRAM,cwd, and the calling pid available;puck-runknows its terminal. Add an optionallocatorto the wire schema ({bundleId?, pid?, title?}). - Rust command
activate_target(locator)->NSRunningApplication(bundleIdentifier:).activate(or AX focus by pid/title). macOS switches to that app's Space automatically. - Frontend: clicking Puck while a located surface is pending calls it. This alone delivers "Puck lit up -> click -> you're where the thing is," across Spaces, without knowing which.
Phase 2 β same-Space vs other-Space (the fiddly bit):
- Determine if the target window is on the current Space:
CGWindowListCopyWindowInfowithkCGWindowListOptionOnScreenOnlylists on-screen windows; absent target -> elsewhere. Robust Space identity needs the semi-private CGSSpace APIs β fragile, optional. - Same Space -> fly to the window's screen rect (window bounds from CGWindowList). Other Space -> a "come hither" beckon toward the screen edge (pairs with the gesture entry), and the click teleports.
Notes: locator is metadata, not content β bundle id + window title, never window contents (anti-creep). Activation is an explicit user click (navigation, not an autonomous action), so it stays inside the safety tiers. Depends on: gesture vocabulary (beckon) and the future native window watcher (NSWorkspace frontmost / CGWindowList).
Camouflage β Puck adapts to what he's floating over
What: When Puck drifts over text or busy content, he reacts to his surroundings like a chameleon/glass-wisp β goes translucent, refracts, or (dream version) "mirrors" the texture behind him onto his own body. Blends, then pops back when he moves to empty space.
Why: Sells "he's really in your desktop, not pasted on top." A creature that responds to its background reads as inhabiting the space. Pairs with the wisp form (already glass-like).
Cheap / free now (no sensing):
- Shy fade: lower sprite opacity while stationary over the busy center of the screen, restore when wandering to the margins β pure CSS/opacity on the existing wander state. Reads as "blending in" without literally seeing anything.
- Refraction (maybe free, needs a WKWebView test): a
backdrop-filterglass body on the sprite. In the transparent overlay this might sample the real desktop behind the window (same open question as the speech-bubble blur) β if it does, a refractive/distort body gives instant chameleon shimmer with zero screen-capture. Test in the Tauri overlay before committing to it.
Dream version β literal mirror (screen-capture gated):
- Sample the screen region directly under the sprite (ScreenCaptureKit), downsample, and paint it onto his body as living camouflage. Stunning, but it's the same hard gate as ambient quips / phototaxis sensors: local-only, ephemeral, opt-in, never stored, never leaves the machine. Texture/color only β never treated as readable content.
Where it slots in: ui/Sprite.tsx (a camouflage intensity prop on .puck-body),
driven by modes/sim from sprite position vs. screen regions. The shy-fade is a
20-minute add; refraction is a test-then-maybe; the literal mirror waits for the screen
watcher + privacy pass.
Real app icons β known apps, OS-extracted where possible
What: Replace the glyph characters (β³ β β β¦) in the sim windows/dock and feed source-markers with real app icons β Claude, ChatGPT, Chrome, Gmail, Mail, Discord, Terminal, etc. Looks dramatically more legit, especially in the overlay.
Pulling the actual OS icon (the good version):
- macOS:
NSWorkspace.shared.icon(forFile: "/Applications/Foo.app")returns the real icon for any installed.appbundle β a Rust/Tauri command can extract β PNG β hand to the webview. So Chrome, Mail, Discord, Slack, the host terminal: real icons, free, always current. - The limit you called: a terminal binary (claude, codex) has no bundle and no icon.
Fall back to the host terminal's icon (iTerm/Terminal/Ghostty/Warp β which the Claude
hook can report via
TERM_PROGRAM), or a curated Puck-styled glyph. - Web apps with no native app (ChatGPT, Gmail as a tab): no bundle to extract from β these need a small curated bundled icon set (a dozen SVGs/PNGs).
So: hybrid. OS extraction where a bundle exists (Rust command, overlay only), curated bundled set for web-apps + terminal-binary fallbacks (works everywhere incl. the Space/sim).
Where it slots in:
ui/Desktop.tsx(WIN_DEFSicons, dock glyphs) and the feedsourcemarker β currently single glyph chars; swap for an<Icon source=β¦>that prefers OS-extracted, falls back to bundled, falls back to glyph.- Overlay-only Rust command
app_icon(bundleId|path) -> pngfor the extraction half. - Engine
SOURCES/EventDefalready carry source identity; add an optionalbundleIdhint.
Cheap first step (no native): ship the curated bundled icon set + sourceβicon map; use glyph only as last resort. The OS-extraction half is an overlay enhancement on top.
Local vision β the private, free path for continuous perception
What: Run Puck's eyes on-device instead of (or alongside) Modal, so real-screen perception is private and continuous vision costs ~nothing. Brain seam already supports it: point PUCK_VISION_URL at a local OpenAI-compatible server.
Capability is NOT the blocker (verified 2026-06-07): screen-reading is OCR + light "what's notable" reasoning β small VLMs excel at it. Options:
- Holotron-12B local via llama.cpp + mmproj β llama.cpp merged Nemotron-Nano-12B-v2-VL
(PR #19547).
convert_hf_to_gguf.py --mmprojβ vision projector βllama-server. Same model as cloud, same capability, ~24GB on the 48GB Mac, free. NB: Ollama can't load the mmproj β must use raw llama-server. Keeps the Nemotron Quest tie. - Qwen2.5-VL-7B local β ~6GB, 95.7 DocVQA, fast; ideal for continuous ambient (every 45s forever on the M4 Max). Loses the Nemotron tie, plenty for screen-reading.
- MiniCPM-V 2.6 (~5.5GB), Moondream2 (1.9B, CPU) as even-lighter fallbacks.
Recommendation: Holotron for the showcase + Quest (cloud now β local llama-server later as the private path); Qwen2.5-VL-7B-local as the cheap continuous engine. The visionMode "Continuous" tier (built, currently same cloud path) should switch to a local PUCK_VISION_URL when this lands.
Where it slots in: zero engine/frontend change β it's a runtime: spin up
llama-server --mmproj (or vLLM/mlx when fixed) and set PUCK_VISION_URL. Plus the
real-screen capture (ScreenCaptureKit, overlay) to feed it actual pixels instead of the
sim snapshot.