puck / BACKLOG.md
vu1n's picture
Puck β€” desktop fairy familiar (HF Build Small)
3c124f3
|
Raw
History Blame Contribute Delete
15.2 kB
# Puck backlog
Ideas not yet built. Each entry: what, why, and where it slots into the architecture
(engine = pure policy/data, ui = presentation, modes/sim = orchestration).
## Gestures β€” sprite dances/reactions keyed to event type
**What:** Puck performs a short gesture matched to *what* he's reporting, not just
flying + speaking. A build passing β†’ a little hop/spin of pride. Tests failing or a
permission block β†’ an urgent shake/alarm wiggle. Discord noise he ignored β†’ a dismissive
shrug. Claude finished β†’ a satisfied stretch. Stale tab β†’ a curious head-tilt. Sleep β†’
a yawn. The gesture plays during the flight-to + bubble beat.
**Why:** On a busy real desktop the sprite is glanceable; a gesture lets you read the
*kind* of news pre-attentively, before the words. It's the single highest charm-per-line
addition left, and it makes the overlay feel like a creature rather than a notifier.
**Where it slots in:**
- `engine/` β€” a pure `gestureFor(event | decision)` map: `EventDef.id`/`source` β†’
gesture name. Data, not logic; pin a couple of mappings with a test. Could also key
off the existing tier/mischief so high-mischief gestures are more theatrical.
- `ui/Sprite.tsx` β€” a `gesture?: GestureName` prop adding a CSS class (`gesture-hop`,
`gesture-shake`, `gesture-shrug`, …); animations on `.puck-bob`/`.puck-face` only
(compositor-friendly transforms, like the existing flap/bob). Clear the class on
animationend so it re-triggers.
- `modes/sim/SimApp.tsx` β€” set the gesture alongside the existing `flyTo`/`setBubble`
in `fireEvent`; clears when the surface resolves.
**Notes:** keep gestures to transform/opacity so they stay GPU-composited (the whole
sprite layer is). Reuse the per-form structure in `SpriteBody` β€” gestures should read
on all four forms (mossling/wisp/gremlin/moth), so animate the wrapper, not form-specific
parts. Pairs naturally with the existing mood tint + alert ring.
**Color pulse (same feature, cheapest channel):** transient-tint the alert ring + glow
to a per-event-type hue while a surface is pending β€” red/urgent for failures &
permission blocks, gold/success for completions, grey/dim for ignored noise. The ring
and `.puck-glow` already read CSS vars (`--accent`, `--glow`), so this is a single
`--alert-hue` override set from the same `engine` map that picks the gesture β€” one
`{ gesture, hue }` lookup feeds both. Even shipping the color pulse *alone* (before
gesture animation work) is a real readability win. Don't fight the existing mood tint
(`--puck-body` etc.): pulse the ring/glow, leave the body color to mood, so "what kind
of news" (hue) and "how Puck feels lately" (mood) stay separate signals.
## Ambient quips β€” Puck reacts to what you're doing
**What:** Occasionally Puck mutters a quip or non-sequitur riffing on your current
context β€” the active app, a window title, the shape of what you're typing. Not help,
not a summary: flavor. "Still in the auth thicket, I see." / "That's a lot of tabs for
one small human." / a non-sequitur about the dock breathing. Low frequency, skippable,
never twice about the same thing.
**Why:** This is the difference between a notifier that lives in the corner and a
familiar that *shares the room*. It's the most "alive" feature on the list β€” and the
most dangerous, so it ships last and most carefully.
**Privacy is the feature, not a caveat** (design doc Β§17.3–17.4 are the law here):
- **Local inference ONLY β€” hard gate.** Context never leaves the machine. If the brain
is the cloud path (Modal/ZeroGPU), this feature is *disabled*, full stop, not degraded.
Enforce in code: the quip path checks the resolved brain is localhost and refuses
otherwise β€” a loud guard, not a setting.
- **Opt-in, off by default.** A distinct toggle from notifications; the permission copy
says plainly what's read and that it stays local.
- **Ephemeral, never persisted, never a trace, never training data.** Context for a quip
is read, used for one generation, dropped. It must not touch the memory garden, the
trace export, or localStorage.
- **Redact before the model sees it.** Strip obvious secrets (password fields, token-shaped
strings, anything in a field marked sensitive). Prefer *abstractions* over content β€”
"a long terminal command", "a code file", "a messaging app" β€” over the literal text.
Quoting back verbatim is the creepy line; stay on the abstract side of it.
- **Never about people or private content.** App categories and your own activity, yes;
the contents of a DM, an email body, a name on screen β€” no.
**Where it slots in:**
- The desktop watcher (future, Β§9.2: NSWorkspace frontmost app, optional AX/screen) is the
context source β€” same daemon `/events` path, a new low-priority `context` event kind, or
a separate local-only endpoint that never queues.
- `engine` decides *whether* to quip (rare; respects annoyance budget / presence, reuses
the interruption-taste machinery so an annoying quip trains Puck quieter).
- The quip generation uses the local brain with a tight "one playful aside, ≀12 words,
never quote the user" prompt; surfaces through the existing bubble channel.
**Ship order:** after the desktop watcher exists and after a privacy pass. Until then it's
sim-only flavor at most (riffing on the fake desktop, where there's nothing real to leak).
## Phototaxis β€” a fairy drawn to stimuli (whimsical wander)
**What:** Puck should behave like a moth/fairy β€” *pulled* toward activity rather than
drifting at random. Flits toward motion, lingers near what's lively, chases the
occasional shiny thing, then loses interest.
**Why:** The wander is the sprite's resting personality β€” it's on screen far more than
any bubble. Uniform-random reads as a screensaver; attraction reads as *alive and
curious*. Highest charm-per-effort of the ambient ideas.
**Buildable now β€” zero new permissions (do this part first):**
Replace the uniform-random wander target in `SimApp` with a weighted pull toward salient
points we already have:
- the **cursor** (occasional gentle follow / curious approach, then retreat β€” never
clingy; respects `presence`),
- the **last event location** (he lingers where something just happened),
- in the overlay, the **focused window** rect (future: NSWorkspace frontmost-app
position via the daemon β€” he hangs near where you're working, patrols where you're not).
Keep it a *gradient*, not a leash: weighted-random pick among attractors + noise, so it
stays unpredictable. Lives in the engine as a pure `pickWanderTarget(attractors, rng)`;
the loop already exists. Tune against `presence` (low = aloof, high = follows the action).
**Sensor-gated β€” backlog, same privacy rules as ambient quips (local-only, ephemeral):**
- **Screen color / motion / "flashing lights":** needs ScreenCaptureKit β€” the transparent
overlay can't see what's beneath it. A coarse, downsampled brightness/motion map (NOT
readable content) could let him drift toward an area that just changed (a video started,
a notification flashed). Local-only, never stored, abstractions not pixels.
- **Audio reactivity:** mic is a hard no by default; "system audio is playing / its level"
is lighter but still opt-in + local-only. A bass-thump bob or a turn-toward-the-sound
would be delightful but ships last, behind the same gate as quips.
**Smell test for all of it:** attraction must stay *cute*, never *surveillant*. He reacts
to the shape of activity (something moved, something's loud), never to its content.
## Take-me-there β€” Puck knows where the activity is, and ferries you to it
**What:** On a real notification, Puck should point you at the *actual* window that needs
you β€” fly to it if it's on this Space, beckon "follow me" if it's on another β€” and
**clicking him navigates there** (focuses the app, macOS brings its Space forward).
**Current behavior (the gap):** wire events arrive with `target: null` (the sim's targets
were fake windows), so in the overlay `fireEvent` flies Puck to a *random* screen point.
He has no idea where the source app lives β€” we never gave him real-window awareness.
This is the design doc's Phase 4 ("patrol the desktops you abandoned" presumes knowing
where they are).
**Phase 1 β€” click-to-activate (high value, no Space geometry needed):**
- Event carries a **locator**: source app bundle id / pid / window title. The Claude hook
already has `TERM_PROGRAM`, `cwd`, and the calling pid available; `puck-run` knows its
terminal. Add an optional `locator` to the wire schema (`{bundleId?, pid?, title?}`).
- Rust command `activate_target(locator)` -> `NSRunningApplication(bundleIdentifier:).activate`
(or AX focus by pid/title). macOS switches to that app's Space automatically.
- Frontend: clicking Puck while a located surface is pending calls it. This alone delivers
"Puck lit up -> click -> you're where the thing is," across Spaces, without knowing which.
**Phase 2 β€” same-Space vs other-Space (the fiddly bit):**
- Determine if the target window is on the *current* Space: `CGWindowListCopyWindowInfo`
with `kCGWindowListOptionOnScreenOnly` lists on-screen windows; absent target -> elsewhere.
Robust Space identity needs the semi-private CGSSpace APIs β€” fragile, optional.
- Same Space -> fly to the window's screen rect (window bounds from CGWindowList).
Other Space -> a "come hither" beckon toward the screen edge (pairs with the gesture
entry), and the click teleports.
**Notes:** locator is metadata, not content β€” bundle id + window title, never window
*contents* (anti-creep). Activation is an explicit user click (navigation, not an
autonomous action), so it stays inside the safety tiers. Depends on: gesture vocabulary
(beckon) and the future native window watcher (NSWorkspace frontmost / CGWindowList).
## Camouflage β€” Puck adapts to what he's floating over
**What:** When Puck drifts over text or busy content, he reacts to his surroundings like
a chameleon/glass-wisp β€” goes translucent, refracts, or (dream version) "mirrors" the
texture behind him onto his own body. Blends, then pops back when he moves to empty space.
**Why:** Sells "he's really *in* your desktop, not pasted on top." A creature that
responds to its background reads as inhabiting the space. Pairs with the wisp form
(already glass-like).
**Cheap / free now (no sensing):**
- **Shy fade:** lower sprite opacity while stationary over the busy center of the screen,
restore when wandering to the margins β€” pure CSS/opacity on the existing wander state.
Reads as "blending in" without literally seeing anything.
- **Refraction (maybe free, needs a WKWebView test):** a `backdrop-filter` glass body on
the sprite. In the transparent overlay this *might* sample the real desktop behind the
window (same open question as the speech-bubble blur) β€” if it does, a refractive/distort
body gives instant chameleon shimmer with zero screen-capture. Test in the Tauri overlay
before committing to it.
**Dream version β€” literal mirror (screen-capture gated):**
- Sample the screen region directly under the sprite (ScreenCaptureKit), downsample, and
paint it onto his body as living camouflage. Stunning, but it's the same hard gate as
ambient quips / phototaxis sensors: local-only, ephemeral, opt-in, never stored, never
leaves the machine. Texture/color only β€” never treated as readable content.
**Where it slots in:** `ui/Sprite.tsx` (a `camouflage` intensity prop on `.puck-body`),
driven by `modes/sim` from sprite position vs. screen regions. The shy-fade is a
20-minute add; refraction is a test-then-maybe; the literal mirror waits for the screen
watcher + privacy pass.
## Real app icons β€” known apps, OS-extracted where possible
**What:** Replace the glyph characters (✳ ◍ βœ‰ …) in the sim windows/dock and feed
source-markers with real app icons β€” Claude, ChatGPT, Chrome, Gmail, Mail, Discord,
Terminal, etc. Looks dramatically more legit, especially in the overlay.
**Pulling the *actual* OS icon (the good version):**
- macOS: `NSWorkspace.shared.icon(forFile: "/Applications/Foo.app")` returns the real
icon for any installed `.app` bundle β€” a Rust/Tauri command can extract β†’ PNG β†’ hand to
the webview. So Chrome, Mail, Discord, Slack, the host terminal: real icons, free, always
current.
- **The limit you called:** a terminal *binary* (claude, codex) has no bundle and no icon.
Fall back to the **host terminal's** icon (iTerm/Terminal/Ghostty/Warp β€” which the Claude
hook can report via `TERM_PROGRAM`), or a curated Puck-styled glyph.
- **Web apps with no native app** (ChatGPT, Gmail as a tab): no bundle to extract from β€”
these need a small **curated bundled icon set** (a dozen SVGs/PNGs).
**So: hybrid.** OS extraction where a bundle exists (Rust command, overlay only), curated
bundled set for web-apps + terminal-binary fallbacks (works everywhere incl. the Space/sim).
**Where it slots in:**
- `ui/Desktop.tsx` (`WIN_DEFS` icons, dock glyphs) and the feed `source` marker β€” currently
single glyph chars; swap for an `<Icon source=…>` that prefers OS-extracted, falls back to
bundled, falls back to glyph.
- Overlay-only Rust command `app_icon(bundleId|path) -> png` for the extraction half.
- Engine `SOURCES`/`EventDef` already carry source identity; add an optional `bundleId` hint.
**Cheap first step (no native):** ship the curated bundled icon set + source→icon map; use
glyph only as last resort. The OS-extraction half is an overlay enhancement on top.
## Local vision β€” the private, free path for continuous perception
**What:** Run Puck's eyes on-device instead of (or alongside) Modal, so real-screen
perception is private and continuous vision costs ~nothing. Brain seam already supports
it: point PUCK_VISION_URL at a local OpenAI-compatible server.
**Capability is NOT the blocker** (verified 2026-06-07): screen-reading is OCR + light
"what's notable" reasoning β€” small VLMs excel at it. Options:
- **Holotron-12B local** via llama.cpp + mmproj β€” llama.cpp merged Nemotron-Nano-12B-v2-VL
(PR #19547). `convert_hf_to_gguf.py --mmproj` β†’ vision projector β†’ `llama-server`. Same
model as cloud, same capability, ~24GB on the 48GB Mac, free. NB: Ollama can't load the
mmproj β€” must use raw llama-server. Keeps the Nemotron Quest tie.
- **Qwen2.5-VL-7B local** β€” ~6GB, 95.7 DocVQA, fast; ideal for *continuous ambient* (every
45s forever on the M4 Max). Loses the Nemotron tie, plenty for screen-reading.
- MiniCPM-V 2.6 (~5.5GB), Moondream2 (1.9B, CPU) as even-lighter fallbacks.
**Recommendation:** Holotron for the showcase + Quest (cloud now β†’ local llama-server
later as the private path); Qwen2.5-VL-7B-local as the cheap continuous engine. The
visionMode "Continuous" tier (built, currently same cloud path) should switch to a local
PUCK_VISION_URL when this lands.
**Where it slots in:** zero engine/frontend change β€” it's a runtime: spin up
`llama-server --mmproj` (or vLLM/mlx when fixed) and set PUCK_VISION_URL. Plus the
real-screen capture (ScreenCaptureKit, overlay) to feed it actual pixels instead of the
sim snapshot.