File size: 15,197 Bytes
3c124f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
# Puck backlog

Ideas not yet built. Each entry: what, why, and where it slots into the architecture
(engine = pure policy/data, ui = presentation, modes/sim = orchestration).

## Gestures β€” sprite dances/reactions keyed to event type

**What:** Puck performs a short gesture matched to *what* he's reporting, not just
flying + speaking. A build passing β†’ a little hop/spin of pride. Tests failing or a
permission block β†’ an urgent shake/alarm wiggle. Discord noise he ignored β†’ a dismissive
shrug. Claude finished β†’ a satisfied stretch. Stale tab β†’ a curious head-tilt. Sleep β†’
a yawn. The gesture plays during the flight-to + bubble beat.

**Why:** On a busy real desktop the sprite is glanceable; a gesture lets you read the
*kind* of news pre-attentively, before the words. It's the single highest charm-per-line
addition left, and it makes the overlay feel like a creature rather than a notifier.

**Where it slots in:**
- `engine/` β€” a pure `gestureFor(event | decision)` map: `EventDef.id`/`source` β†’
  gesture name. Data, not logic; pin a couple of mappings with a test. Could also key
  off the existing tier/mischief so high-mischief gestures are more theatrical.
- `ui/Sprite.tsx` β€” a `gesture?: GestureName` prop adding a CSS class (`gesture-hop`,
  `gesture-shake`, `gesture-shrug`, …); animations on `.puck-bob`/`.puck-face` only
  (compositor-friendly transforms, like the existing flap/bob). Clear the class on
  animationend so it re-triggers.
- `modes/sim/SimApp.tsx` β€” set the gesture alongside the existing `flyTo`/`setBubble`
  in `fireEvent`; clears when the surface resolves.

**Notes:** keep gestures to transform/opacity so they stay GPU-composited (the whole
sprite layer is). Reuse the per-form structure in `SpriteBody` β€” gestures should read
on all four forms (mossling/wisp/gremlin/moth), so animate the wrapper, not form-specific
parts. Pairs naturally with the existing mood tint + alert ring.

**Color pulse (same feature, cheapest channel):** transient-tint the alert ring + glow
to a per-event-type hue while a surface is pending β€” red/urgent for failures &
permission blocks, gold/success for completions, grey/dim for ignored noise. The ring
and `.puck-glow` already read CSS vars (`--accent`, `--glow`), so this is a single
`--alert-hue` override set from the same `engine` map that picks the gesture β€” one
`{ gesture, hue }` lookup feeds both. Even shipping the color pulse *alone* (before
gesture animation work) is a real readability win. Don't fight the existing mood tint
(`--puck-body` etc.): pulse the ring/glow, leave the body color to mood, so "what kind
of news" (hue) and "how Puck feels lately" (mood) stay separate signals.

## Ambient quips β€” Puck reacts to what you're doing

**What:** Occasionally Puck mutters a quip or non-sequitur riffing on your current
context β€” the active app, a window title, the shape of what you're typing. Not help,
not a summary: flavor. "Still in the auth thicket, I see." / "That's a lot of tabs for
one small human." / a non-sequitur about the dock breathing. Low frequency, skippable,
never twice about the same thing.

**Why:** This is the difference between a notifier that lives in the corner and a
familiar that *shares the room*. It's the most "alive" feature on the list β€” and the
most dangerous, so it ships last and most carefully.

**Privacy is the feature, not a caveat** (design doc Β§17.3–17.4 are the law here):
- **Local inference ONLY β€” hard gate.** Context never leaves the machine. If the brain
  is the cloud path (Modal/ZeroGPU), this feature is *disabled*, full stop, not degraded.
  Enforce in code: the quip path checks the resolved brain is localhost and refuses
  otherwise β€” a loud guard, not a setting.
- **Opt-in, off by default.** A distinct toggle from notifications; the permission copy
  says plainly what's read and that it stays local.
- **Ephemeral, never persisted, never a trace, never training data.** Context for a quip
  is read, used for one generation, dropped. It must not touch the memory garden, the
  trace export, or localStorage.
- **Redact before the model sees it.** Strip obvious secrets (password fields, token-shaped
  strings, anything in a field marked sensitive). Prefer *abstractions* over content β€”
  "a long terminal command", "a code file", "a messaging app" β€” over the literal text.
  Quoting back verbatim is the creepy line; stay on the abstract side of it.
- **Never about people or private content.** App categories and your own activity, yes;
  the contents of a DM, an email body, a name on screen β€” no.

**Where it slots in:**
- The desktop watcher (future, Β§9.2: NSWorkspace frontmost app, optional AX/screen) is the
  context source β€” same daemon `/events` path, a new low-priority `context` event kind, or
  a separate local-only endpoint that never queues.
- `engine` decides *whether* to quip (rare; respects annoyance budget / presence, reuses
  the interruption-taste machinery so an annoying quip trains Puck quieter).
- The quip generation uses the local brain with a tight "one playful aside, ≀12 words,
  never quote the user" prompt; surfaces through the existing bubble channel.

**Ship order:** after the desktop watcher exists and after a privacy pass. Until then it's
sim-only flavor at most (riffing on the fake desktop, where there's nothing real to leak).

## Phototaxis β€” a fairy drawn to stimuli (whimsical wander)

**What:** Puck should behave like a moth/fairy β€” *pulled* toward activity rather than
drifting at random. Flits toward motion, lingers near what's lively, chases the
occasional shiny thing, then loses interest.

**Why:** The wander is the sprite's resting personality β€” it's on screen far more than
any bubble. Uniform-random reads as a screensaver; attraction reads as *alive and
curious*. Highest charm-per-effort of the ambient ideas.

**Buildable now β€” zero new permissions (do this part first):**
Replace the uniform-random wander target in `SimApp` with a weighted pull toward salient
points we already have:
- the **cursor** (occasional gentle follow / curious approach, then retreat β€” never
  clingy; respects `presence`),
- the **last event location** (he lingers where something just happened),
- in the overlay, the **focused window** rect (future: NSWorkspace frontmost-app
  position via the daemon β€” he hangs near where you're working, patrols where you're not).
Keep it a *gradient*, not a leash: weighted-random pick among attractors + noise, so it
stays unpredictable. Lives in the engine as a pure `pickWanderTarget(attractors, rng)`;
the loop already exists. Tune against `presence` (low = aloof, high = follows the action).

**Sensor-gated β€” backlog, same privacy rules as ambient quips (local-only, ephemeral):**
- **Screen color / motion / "flashing lights":** needs ScreenCaptureKit β€” the transparent
  overlay can't see what's beneath it. A coarse, downsampled brightness/motion map (NOT
  readable content) could let him drift toward an area that just changed (a video started,
  a notification flashed). Local-only, never stored, abstractions not pixels.
- **Audio reactivity:** mic is a hard no by default; "system audio is playing / its level"
  is lighter but still opt-in + local-only. A bass-thump bob or a turn-toward-the-sound
  would be delightful but ships last, behind the same gate as quips.

**Smell test for all of it:** attraction must stay *cute*, never *surveillant*. He reacts
to the shape of activity (something moved, something's loud), never to its content.

## Take-me-there β€” Puck knows where the activity is, and ferries you to it

**What:** On a real notification, Puck should point you at the *actual* window that needs
you β€” fly to it if it's on this Space, beckon "follow me" if it's on another β€” and
**clicking him navigates there** (focuses the app, macOS brings its Space forward).

**Current behavior (the gap):** wire events arrive with `target: null` (the sim's targets
were fake windows), so in the overlay `fireEvent` flies Puck to a *random* screen point.
He has no idea where the source app lives β€” we never gave him real-window awareness.
This is the design doc's Phase 4 ("patrol the desktops you abandoned" presumes knowing
where they are).

**Phase 1 β€” click-to-activate (high value, no Space geometry needed):**
- Event carries a **locator**: source app bundle id / pid / window title. The Claude hook
  already has `TERM_PROGRAM`, `cwd`, and the calling pid available; `puck-run` knows its
  terminal. Add an optional `locator` to the wire schema (`{bundleId?, pid?, title?}`).
- Rust command `activate_target(locator)` -> `NSRunningApplication(bundleIdentifier:).activate`
  (or AX focus by pid/title). macOS switches to that app's Space automatically.
- Frontend: clicking Puck while a located surface is pending calls it. This alone delivers
  "Puck lit up -> click -> you're where the thing is," across Spaces, without knowing which.

**Phase 2 β€” same-Space vs other-Space (the fiddly bit):**
- Determine if the target window is on the *current* Space: `CGWindowListCopyWindowInfo`
  with `kCGWindowListOptionOnScreenOnly` lists on-screen windows; absent target -> elsewhere.
  Robust Space identity needs the semi-private CGSSpace APIs β€” fragile, optional.
- Same Space -> fly to the window's screen rect (window bounds from CGWindowList).
  Other Space -> a "come hither" beckon toward the screen edge (pairs with the gesture
  entry), and the click teleports.

**Notes:** locator is metadata, not content β€” bundle id + window title, never window
*contents* (anti-creep). Activation is an explicit user click (navigation, not an
autonomous action), so it stays inside the safety tiers. Depends on: gesture vocabulary
(beckon) and the future native window watcher (NSWorkspace frontmost / CGWindowList).

## Camouflage β€” Puck adapts to what he's floating over

**What:** When Puck drifts over text or busy content, he reacts to his surroundings like
a chameleon/glass-wisp β€” goes translucent, refracts, or (dream version) "mirrors" the
texture behind him onto his own body. Blends, then pops back when he moves to empty space.

**Why:** Sells "he's really *in* your desktop, not pasted on top." A creature that
responds to its background reads as inhabiting the space. Pairs with the wisp form
(already glass-like).

**Cheap / free now (no sensing):**
- **Shy fade:** lower sprite opacity while stationary over the busy center of the screen,
  restore when wandering to the margins β€” pure CSS/opacity on the existing wander state.
  Reads as "blending in" without literally seeing anything.
- **Refraction (maybe free, needs a WKWebView test):** a `backdrop-filter` glass body on
  the sprite. In the transparent overlay this *might* sample the real desktop behind the
  window (same open question as the speech-bubble blur) β€” if it does, a refractive/distort
  body gives instant chameleon shimmer with zero screen-capture. Test in the Tauri overlay
  before committing to it.

**Dream version β€” literal mirror (screen-capture gated):**
- Sample the screen region directly under the sprite (ScreenCaptureKit), downsample, and
  paint it onto his body as living camouflage. Stunning, but it's the same hard gate as
  ambient quips / phototaxis sensors: local-only, ephemeral, opt-in, never stored, never
  leaves the machine. Texture/color only β€” never treated as readable content.

**Where it slots in:** `ui/Sprite.tsx` (a `camouflage` intensity prop on `.puck-body`),
driven by `modes/sim` from sprite position vs. screen regions. The shy-fade is a
20-minute add; refraction is a test-then-maybe; the literal mirror waits for the screen
watcher + privacy pass.

## Real app icons β€” known apps, OS-extracted where possible

**What:** Replace the glyph characters (✳ ◍ βœ‰ …) in the sim windows/dock and feed
source-markers with real app icons β€” Claude, ChatGPT, Chrome, Gmail, Mail, Discord,
Terminal, etc. Looks dramatically more legit, especially in the overlay.

**Pulling the *actual* OS icon (the good version):**
- macOS: `NSWorkspace.shared.icon(forFile: "/Applications/Foo.app")` returns the real
  icon for any installed `.app` bundle β€” a Rust/Tauri command can extract β†’ PNG β†’ hand to
  the webview. So Chrome, Mail, Discord, Slack, the host terminal: real icons, free, always
  current.
- **The limit you called:** a terminal *binary* (claude, codex) has no bundle and no icon.
  Fall back to the **host terminal's** icon (iTerm/Terminal/Ghostty/Warp β€” which the Claude
  hook can report via `TERM_PROGRAM`), or a curated Puck-styled glyph.
- **Web apps with no native app** (ChatGPT, Gmail as a tab): no bundle to extract from β€”
  these need a small **curated bundled icon set** (a dozen SVGs/PNGs).

**So: hybrid.** OS extraction where a bundle exists (Rust command, overlay only), curated
bundled set for web-apps + terminal-binary fallbacks (works everywhere incl. the Space/sim).

**Where it slots in:**
- `ui/Desktop.tsx` (`WIN_DEFS` icons, dock glyphs) and the feed `source` marker β€” currently
  single glyph chars; swap for an `<Icon source=…>` that prefers OS-extracted, falls back to
  bundled, falls back to glyph.
- Overlay-only Rust command `app_icon(bundleId|path) -> png` for the extraction half.
- Engine `SOURCES`/`EventDef` already carry source identity; add an optional `bundleId` hint.

**Cheap first step (no native):** ship the curated bundled icon set + source→icon map; use
glyph only as last resort. The OS-extraction half is an overlay enhancement on top.

## Local vision β€” the private, free path for continuous perception

**What:** Run Puck's eyes on-device instead of (or alongside) Modal, so real-screen
perception is private and continuous vision costs ~nothing. Brain seam already supports
it: point PUCK_VISION_URL at a local OpenAI-compatible server.

**Capability is NOT the blocker** (verified 2026-06-07): screen-reading is OCR + light
"what's notable" reasoning β€” small VLMs excel at it. Options:
- **Holotron-12B local** via llama.cpp + mmproj β€” llama.cpp merged Nemotron-Nano-12B-v2-VL
  (PR #19547). `convert_hf_to_gguf.py --mmproj` β†’ vision projector β†’ `llama-server`. Same
  model as cloud, same capability, ~24GB on the 48GB Mac, free. NB: Ollama can't load the
  mmproj β€” must use raw llama-server. Keeps the Nemotron Quest tie.
- **Qwen2.5-VL-7B local** β€” ~6GB, 95.7 DocVQA, fast; ideal for *continuous ambient* (every
  45s forever on the M4 Max). Loses the Nemotron tie, plenty for screen-reading.
- MiniCPM-V 2.6 (~5.5GB), Moondream2 (1.9B, CPU) as even-lighter fallbacks.

**Recommendation:** Holotron for the showcase + Quest (cloud now β†’ local llama-server
later as the private path); Qwen2.5-VL-7B-local as the cheap continuous engine. The
visionMode "Continuous" tier (built, currently same cloud path) should switch to a local
PUCK_VISION_URL when this lands.

**Where it slots in:** zero engine/frontend change β€” it's a runtime: spin up
`llama-server --mmproj` (or vLLM/mlx when fixed) and set PUCK_VISION_URL. Plus the
real-screen capture (ScreenCaptureKit, overlay) to feed it actual pixels instead of the
sim snapshot.