Spaces:
Running
Running
File size: 15,197 Bytes
3c124f3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | # Puck backlog
Ideas not yet built. Each entry: what, why, and where it slots into the architecture
(engine = pure policy/data, ui = presentation, modes/sim = orchestration).
## Gestures β sprite dances/reactions keyed to event type
**What:** Puck performs a short gesture matched to *what* he's reporting, not just
flying + speaking. A build passing β a little hop/spin of pride. Tests failing or a
permission block β an urgent shake/alarm wiggle. Discord noise he ignored β a dismissive
shrug. Claude finished β a satisfied stretch. Stale tab β a curious head-tilt. Sleep β
a yawn. The gesture plays during the flight-to + bubble beat.
**Why:** On a busy real desktop the sprite is glanceable; a gesture lets you read the
*kind* of news pre-attentively, before the words. It's the single highest charm-per-line
addition left, and it makes the overlay feel like a creature rather than a notifier.
**Where it slots in:**
- `engine/` β a pure `gestureFor(event | decision)` map: `EventDef.id`/`source` β
gesture name. Data, not logic; pin a couple of mappings with a test. Could also key
off the existing tier/mischief so high-mischief gestures are more theatrical.
- `ui/Sprite.tsx` β a `gesture?: GestureName` prop adding a CSS class (`gesture-hop`,
`gesture-shake`, `gesture-shrug`, β¦); animations on `.puck-bob`/`.puck-face` only
(compositor-friendly transforms, like the existing flap/bob). Clear the class on
animationend so it re-triggers.
- `modes/sim/SimApp.tsx` β set the gesture alongside the existing `flyTo`/`setBubble`
in `fireEvent`; clears when the surface resolves.
**Notes:** keep gestures to transform/opacity so they stay GPU-composited (the whole
sprite layer is). Reuse the per-form structure in `SpriteBody` β gestures should read
on all four forms (mossling/wisp/gremlin/moth), so animate the wrapper, not form-specific
parts. Pairs naturally with the existing mood tint + alert ring.
**Color pulse (same feature, cheapest channel):** transient-tint the alert ring + glow
to a per-event-type hue while a surface is pending β red/urgent for failures &
permission blocks, gold/success for completions, grey/dim for ignored noise. The ring
and `.puck-glow` already read CSS vars (`--accent`, `--glow`), so this is a single
`--alert-hue` override set from the same `engine` map that picks the gesture β one
`{ gesture, hue }` lookup feeds both. Even shipping the color pulse *alone* (before
gesture animation work) is a real readability win. Don't fight the existing mood tint
(`--puck-body` etc.): pulse the ring/glow, leave the body color to mood, so "what kind
of news" (hue) and "how Puck feels lately" (mood) stay separate signals.
## Ambient quips β Puck reacts to what you're doing
**What:** Occasionally Puck mutters a quip or non-sequitur riffing on your current
context β the active app, a window title, the shape of what you're typing. Not help,
not a summary: flavor. "Still in the auth thicket, I see." / "That's a lot of tabs for
one small human." / a non-sequitur about the dock breathing. Low frequency, skippable,
never twice about the same thing.
**Why:** This is the difference between a notifier that lives in the corner and a
familiar that *shares the room*. It's the most "alive" feature on the list β and the
most dangerous, so it ships last and most carefully.
**Privacy is the feature, not a caveat** (design doc Β§17.3β17.4 are the law here):
- **Local inference ONLY β hard gate.** Context never leaves the machine. If the brain
is the cloud path (Modal/ZeroGPU), this feature is *disabled*, full stop, not degraded.
Enforce in code: the quip path checks the resolved brain is localhost and refuses
otherwise β a loud guard, not a setting.
- **Opt-in, off by default.** A distinct toggle from notifications; the permission copy
says plainly what's read and that it stays local.
- **Ephemeral, never persisted, never a trace, never training data.** Context for a quip
is read, used for one generation, dropped. It must not touch the memory garden, the
trace export, or localStorage.
- **Redact before the model sees it.** Strip obvious secrets (password fields, token-shaped
strings, anything in a field marked sensitive). Prefer *abstractions* over content β
"a long terminal command", "a code file", "a messaging app" β over the literal text.
Quoting back verbatim is the creepy line; stay on the abstract side of it.
- **Never about people or private content.** App categories and your own activity, yes;
the contents of a DM, an email body, a name on screen β no.
**Where it slots in:**
- The desktop watcher (future, Β§9.2: NSWorkspace frontmost app, optional AX/screen) is the
context source β same daemon `/events` path, a new low-priority `context` event kind, or
a separate local-only endpoint that never queues.
- `engine` decides *whether* to quip (rare; respects annoyance budget / presence, reuses
the interruption-taste machinery so an annoying quip trains Puck quieter).
- The quip generation uses the local brain with a tight "one playful aside, β€12 words,
never quote the user" prompt; surfaces through the existing bubble channel.
**Ship order:** after the desktop watcher exists and after a privacy pass. Until then it's
sim-only flavor at most (riffing on the fake desktop, where there's nothing real to leak).
## Phototaxis β a fairy drawn to stimuli (whimsical wander)
**What:** Puck should behave like a moth/fairy β *pulled* toward activity rather than
drifting at random. Flits toward motion, lingers near what's lively, chases the
occasional shiny thing, then loses interest.
**Why:** The wander is the sprite's resting personality β it's on screen far more than
any bubble. Uniform-random reads as a screensaver; attraction reads as *alive and
curious*. Highest charm-per-effort of the ambient ideas.
**Buildable now β zero new permissions (do this part first):**
Replace the uniform-random wander target in `SimApp` with a weighted pull toward salient
points we already have:
- the **cursor** (occasional gentle follow / curious approach, then retreat β never
clingy; respects `presence`),
- the **last event location** (he lingers where something just happened),
- in the overlay, the **focused window** rect (future: NSWorkspace frontmost-app
position via the daemon β he hangs near where you're working, patrols where you're not).
Keep it a *gradient*, not a leash: weighted-random pick among attractors + noise, so it
stays unpredictable. Lives in the engine as a pure `pickWanderTarget(attractors, rng)`;
the loop already exists. Tune against `presence` (low = aloof, high = follows the action).
**Sensor-gated β backlog, same privacy rules as ambient quips (local-only, ephemeral):**
- **Screen color / motion / "flashing lights":** needs ScreenCaptureKit β the transparent
overlay can't see what's beneath it. A coarse, downsampled brightness/motion map (NOT
readable content) could let him drift toward an area that just changed (a video started,
a notification flashed). Local-only, never stored, abstractions not pixels.
- **Audio reactivity:** mic is a hard no by default; "system audio is playing / its level"
is lighter but still opt-in + local-only. A bass-thump bob or a turn-toward-the-sound
would be delightful but ships last, behind the same gate as quips.
**Smell test for all of it:** attraction must stay *cute*, never *surveillant*. He reacts
to the shape of activity (something moved, something's loud), never to its content.
## Take-me-there β Puck knows where the activity is, and ferries you to it
**What:** On a real notification, Puck should point you at the *actual* window that needs
you β fly to it if it's on this Space, beckon "follow me" if it's on another β and
**clicking him navigates there** (focuses the app, macOS brings its Space forward).
**Current behavior (the gap):** wire events arrive with `target: null` (the sim's targets
were fake windows), so in the overlay `fireEvent` flies Puck to a *random* screen point.
He has no idea where the source app lives β we never gave him real-window awareness.
This is the design doc's Phase 4 ("patrol the desktops you abandoned" presumes knowing
where they are).
**Phase 1 β click-to-activate (high value, no Space geometry needed):**
- Event carries a **locator**: source app bundle id / pid / window title. The Claude hook
already has `TERM_PROGRAM`, `cwd`, and the calling pid available; `puck-run` knows its
terminal. Add an optional `locator` to the wire schema (`{bundleId?, pid?, title?}`).
- Rust command `activate_target(locator)` -> `NSRunningApplication(bundleIdentifier:).activate`
(or AX focus by pid/title). macOS switches to that app's Space automatically.
- Frontend: clicking Puck while a located surface is pending calls it. This alone delivers
"Puck lit up -> click -> you're where the thing is," across Spaces, without knowing which.
**Phase 2 β same-Space vs other-Space (the fiddly bit):**
- Determine if the target window is on the *current* Space: `CGWindowListCopyWindowInfo`
with `kCGWindowListOptionOnScreenOnly` lists on-screen windows; absent target -> elsewhere.
Robust Space identity needs the semi-private CGSSpace APIs β fragile, optional.
- Same Space -> fly to the window's screen rect (window bounds from CGWindowList).
Other Space -> a "come hither" beckon toward the screen edge (pairs with the gesture
entry), and the click teleports.
**Notes:** locator is metadata, not content β bundle id + window title, never window
*contents* (anti-creep). Activation is an explicit user click (navigation, not an
autonomous action), so it stays inside the safety tiers. Depends on: gesture vocabulary
(beckon) and the future native window watcher (NSWorkspace frontmost / CGWindowList).
## Camouflage β Puck adapts to what he's floating over
**What:** When Puck drifts over text or busy content, he reacts to his surroundings like
a chameleon/glass-wisp β goes translucent, refracts, or (dream version) "mirrors" the
texture behind him onto his own body. Blends, then pops back when he moves to empty space.
**Why:** Sells "he's really *in* your desktop, not pasted on top." A creature that
responds to its background reads as inhabiting the space. Pairs with the wisp form
(already glass-like).
**Cheap / free now (no sensing):**
- **Shy fade:** lower sprite opacity while stationary over the busy center of the screen,
restore when wandering to the margins β pure CSS/opacity on the existing wander state.
Reads as "blending in" without literally seeing anything.
- **Refraction (maybe free, needs a WKWebView test):** a `backdrop-filter` glass body on
the sprite. In the transparent overlay this *might* sample the real desktop behind the
window (same open question as the speech-bubble blur) β if it does, a refractive/distort
body gives instant chameleon shimmer with zero screen-capture. Test in the Tauri overlay
before committing to it.
**Dream version β literal mirror (screen-capture gated):**
- Sample the screen region directly under the sprite (ScreenCaptureKit), downsample, and
paint it onto his body as living camouflage. Stunning, but it's the same hard gate as
ambient quips / phototaxis sensors: local-only, ephemeral, opt-in, never stored, never
leaves the machine. Texture/color only β never treated as readable content.
**Where it slots in:** `ui/Sprite.tsx` (a `camouflage` intensity prop on `.puck-body`),
driven by `modes/sim` from sprite position vs. screen regions. The shy-fade is a
20-minute add; refraction is a test-then-maybe; the literal mirror waits for the screen
watcher + privacy pass.
## Real app icons β known apps, OS-extracted where possible
**What:** Replace the glyph characters (β³ β β β¦) in the sim windows/dock and feed
source-markers with real app icons β Claude, ChatGPT, Chrome, Gmail, Mail, Discord,
Terminal, etc. Looks dramatically more legit, especially in the overlay.
**Pulling the *actual* OS icon (the good version):**
- macOS: `NSWorkspace.shared.icon(forFile: "/Applications/Foo.app")` returns the real
icon for any installed `.app` bundle β a Rust/Tauri command can extract β PNG β hand to
the webview. So Chrome, Mail, Discord, Slack, the host terminal: real icons, free, always
current.
- **The limit you called:** a terminal *binary* (claude, codex) has no bundle and no icon.
Fall back to the **host terminal's** icon (iTerm/Terminal/Ghostty/Warp β which the Claude
hook can report via `TERM_PROGRAM`), or a curated Puck-styled glyph.
- **Web apps with no native app** (ChatGPT, Gmail as a tab): no bundle to extract from β
these need a small **curated bundled icon set** (a dozen SVGs/PNGs).
**So: hybrid.** OS extraction where a bundle exists (Rust command, overlay only), curated
bundled set for web-apps + terminal-binary fallbacks (works everywhere incl. the Space/sim).
**Where it slots in:**
- `ui/Desktop.tsx` (`WIN_DEFS` icons, dock glyphs) and the feed `source` marker β currently
single glyph chars; swap for an `<Icon source=β¦>` that prefers OS-extracted, falls back to
bundled, falls back to glyph.
- Overlay-only Rust command `app_icon(bundleId|path) -> png` for the extraction half.
- Engine `SOURCES`/`EventDef` already carry source identity; add an optional `bundleId` hint.
**Cheap first step (no native):** ship the curated bundled icon set + sourceβicon map; use
glyph only as last resort. The OS-extraction half is an overlay enhancement on top.
## Local vision β the private, free path for continuous perception
**What:** Run Puck's eyes on-device instead of (or alongside) Modal, so real-screen
perception is private and continuous vision costs ~nothing. Brain seam already supports
it: point PUCK_VISION_URL at a local OpenAI-compatible server.
**Capability is NOT the blocker** (verified 2026-06-07): screen-reading is OCR + light
"what's notable" reasoning β small VLMs excel at it. Options:
- **Holotron-12B local** via llama.cpp + mmproj β llama.cpp merged Nemotron-Nano-12B-v2-VL
(PR #19547). `convert_hf_to_gguf.py --mmproj` β vision projector β `llama-server`. Same
model as cloud, same capability, ~24GB on the 48GB Mac, free. NB: Ollama can't load the
mmproj β must use raw llama-server. Keeps the Nemotron Quest tie.
- **Qwen2.5-VL-7B local** β ~6GB, 95.7 DocVQA, fast; ideal for *continuous ambient* (every
45s forever on the M4 Max). Loses the Nemotron tie, plenty for screen-reading.
- MiniCPM-V 2.6 (~5.5GB), Moondream2 (1.9B, CPU) as even-lighter fallbacks.
**Recommendation:** Holotron for the showcase + Quest (cloud now β local llama-server
later as the private path); Qwen2.5-VL-7B-local as the cheap continuous engine. The
visionMode "Continuous" tier (built, currently same cloud path) should switch to a local
PUCK_VISION_URL when this lands.
**Where it slots in:** zero engine/frontend change β it's a runtime: spin up
`llama-server --mmproj` (or vLLM/mlx when fixed) and set PUCK_VISION_URL. Plus the
real-screen capture (ScreenCaptureKit, overlay) to feed it actual pixels instead of the
sim snapshot.
|