# WitGym — Field Notes (Build Small 2026)

## What I built
WitGym is a comedy coaching engine for real-life awkward moments. You paste a situation, it produces **one sharp line**, then lets you iterate with drills (sharpen / different angle / explain).

The core bet: **comedy transfers by structure, not by topic**. Instead of “RAG on jokes”, WitGym does **CBR-RAG on comedy mechanics** and uses precedent from *The Office* to ground the response.

## The small-model constraint (≤32B) changed the design
Under the Build Small constraint, the goal wasn’t “generate funnier text by scaling”, it was “get reliable *wit* by adding structure”:

- **Pass 1 (extraction)**: extract a compact schema (`ComedyMetadata`) describing the moment: archetype, tension, violation distance, subtext, behavioral observation, etc.\n
- **Retrieval (CBR-RAG)**: retrieve *structurally similar* precedent scenes from a prebuilt index.\n
- **Pass 2 (generation)**: draft 2–3 persona candidates with strict constraints.\n
- **Pass 3 (ranking)**: pick a winner with an explicit judging rubric (truth precision + strong ending + domain anchoring).\n
- **Pass 4 (compression)**: optionally tighten the winner to one crisp line.\n

## What was unexpectedly important
- **Behavioral observation > feelings**: naming the *move* (“renamed procrastination as ‘keeping options open’”) is a better generative seed than therapy-language subtext.\n
- **Ranking beats clever prompting**: the biggest quality jumps came from forcing a tournament-style selection rubric, especially “truth precision” and “final clause” quality.\n
- **Progressive disclosure UX matters**: streaming phase updates and an expandable trace makes judges trust the system (it’s not “vibes”; you can see what it did).\n

## Models used
- **LLM**: `Qwen/Qwen3.5-27B` (≤32B) via Hugging Face Inference Providers (recommended runtime path).\n
- **Embedder**: `BAAI/bge-small-en-v1.5` (33M) for retrieval.\n
- **Reranker (optional)**: `cross-encoder/ettin-reranker-32m-v1` (CPU) for pool reranking.\n

## What I’d improve next (post-hackathon)
- Make “coach mode” differentiate *response style*, not just “add an explanation panel”.\n
- Add a public, privacy-safe trace export that covers **both** evaluation runs and real usage patterns (with de-identification).\n
- Tighten the “small talk” and “low twist” path so it’s still delightful without running the full pipeline.\n