Spaces:

xleaps
/

sgo

Running

Eric Xu commited on 26 days ago

Commit

9198e06

unverified ·

1 Parent(s): 2934f67

Add goal-weighted gradient (VJP) — optimize toward objectives, not universal appeal

The semantic gradient is now a vector-Jacobian product: each evaluator
is weighted by their relevance to the user's stated goal.

Uniform: nabla_j = (1/n) sum_i J_ij (what pleases everyone)
VJP: nabla_j = sum_i v_i * J_ij (what moves toward goal)

Without a goal, behavior is unchanged (uniform weights). With a goal,
the LLM scores each evaluator's relevance (0-1) and the gradient
prioritizes changes that matter to the right audience.

- counterfactual.py: add --goal flag, compute_goal_weights(), VJP in analyze_gradient()
- web: add goal input field, pass through to counterfactual endpoint
- README: update math section with VJP formulation

Files changed (4) hide show

README.md +15 -7
scripts/counterfactual.py +115 -16
web/app.py +26 -5
web/static/index.html +14 -1

README.md CHANGED Viewed

@@ -261,17 +261,23 @@ The gap between SGO and real expert panels has three components:
 ## The Semantic Gradient
-For evaluators in the "movable middle" (scores 4–7), SGO asks: *"if this changed, what's your new score?"*
-This produces a Jacobian matrix where each cell is a score delta:
 $$J_{ij} = f(\theta + \Delta_j, \; x_i) - f(\theta, \; x_i)$$
-The semantic gradient is the column mean — the average impact of each change across the panel:
-$$\nabla_j = \frac{1}{n}\sum_{i} J_{ij}$$
-Rank by this value descending: that's your priority list.
 ### What to probe
@@ -290,10 +296,12 @@ Only probe changes you'd actually make:
 |--------|---------|
 | θ | Entity you control |
 | x | Evaluator persona |
 | f(θ, x) | LLM evaluation → score + reasoning |
 | Δⱼ | Hypothetical change |
 | Jᵢⱼ | Score delta: evaluator *i*, change *j* |
-| ∇ⱼ | Semantic gradient: mean impact of change *j* |
 ## Project Structure

 ## The Semantic Gradient
+SGO computes a Jacobian matrix of score deltas — how each evaluator's score would shift for each hypothetical change:
 $$J_{ij} = f(\theta + \Delta_j, \; x_i) - f(\theta, \; x_i)$$
+### Goal-weighted gradient (VJP)
+The key insight: not all evaluators matter equally. A luxury brand shouldn't optimize for budget shoppers. A dating profile shouldn't optimize for incompatible matches.
+SGO uses a **goal vector** `v` that weights each evaluator by their relevance to your objective. The gradient is a vector-Jacobian product:
+$$\nabla_j = \sum_{i} v_i \cdot J_{ij}$$
+Where `v_i` is the goal-relevance weight for evaluator `i` (0 = irrelevant, 1 = ideal target).
+Without a goal, `v = [1/n, ...]` — uniform weights, optimizing for universal appeal. With a goal like *"close enterprise deals"*, enterprise CTOs get `v ≈ 1` and solo hobbyists get `v ≈ 0`.
+The LLM assigns goal-relevance weights automatically by evaluating each persona against your stated objective. This means the gradient tells you *"what changes move you toward your goal"*, not *"what changes make everyone like you more"*.
 ### What to probe
 |--------|---------|
 | θ | Entity you control |
 | x | Evaluator persona |
+| g | Goal — what you're optimizing for |
 | f(θ, x) | LLM evaluation → score + reasoning |
+| v_i | Goal-relevance weight for evaluator *i* |
 | Δⱼ | Hypothetical change |
 | Jᵢⱼ | Score delta: evaluator *i*, change *j* |
+| ∇ⱼ | Goal-weighted gradient (VJP): impact of change *j* toward goal *g* |
 ## Project Structure

scripts/counterfactual.py CHANGED Viewed

@@ -128,29 +128,94 @@ def probe_one(client, model, eval_result, cohort_map, all_changes):
         return {"error": str(e), "_evaluator": ev}
-def analyze_gradient(results, all_changes):
     valid = [r for r in results if "counterfactuals" in r]
     if not valid:
         return "No valid results."
     labels = {c["id"]: c["label"] for c in all_changes}
     jacobian = defaultdict(list)
     for r in valid:
         for cf in r.get("counterfactuals", []):
             jacobian[cf.get("change_id", "")].append({
                 "delta": cf.get("delta", 0),
-                "name": r["_evaluator"].get("name", ""),
                 "age": r["_evaluator"].get("age", ""),
                 "reasoning": cf.get("reasoning", ""),
             })
     ranked = []
     for cid, deltas in jacobian.items():
-        avg = sum(d["delta"] for d in deltas) / len(deltas)
         ranked.append({
             "id": cid, "label": labels.get(cid, cid),
-            "avg_delta": avg,
             "max_delta": max(d["delta"] for d in deltas),
             "min_delta": min(d["delta"] for d in deltas),
             "positive": sum(1 for d in deltas if d["delta"] > 0),
@@ -159,29 +224,46 @@ def analyze_gradient(results, all_changes):
         })
     ranked.sort(key=lambda x: x["avg_delta"], reverse=True)
-    lines = [f"# Semantic Gradient\n\nProbed {len(valid)} evaluators across {len(all_changes)} changes.\n"]
-    lines.append(f"{'Rank':<5} {'Avg Δ':>6} {'Max':>5} {'Min':>5} {'👍':>4} {'👎':>4}  Change")
     lines.append("-" * 75)
     for i, r in enumerate(ranked, 1):
-        lines.append(
-            f"{i:<5} {r['avg_delta']:>+5.1f}  {r['max_delta']:>+4}  {r['min_delta']:>+4}  "
-            f"{r['positive']:>3}  {r['negative']:>3}   {r['label']}"
-        )
     lines.append(f"\n## Top 3 — Detail\n")
     for r in ranked[:3]:
-        lines.append(f"### {r['label']} (avg Δ {r['avg_delta']:+.1f})\n")
         positive = sorted([d for d in r["details"] if d["delta"] > 0],
-                          key=lambda x: x["delta"], reverse=True)
         if positive:
             lines.append("**Helps:**")
             for d in positive[:5]:
-                lines.append(f"  +{d['delta']} {d['name']} ({d['age']}): {d['reasoning']}")
         negative = [d for d in r["details"] if d["delta"] < 0]
         if negative:
             lines.append("**Hurts:**")
             for d in sorted(negative, key=lambda x: x["delta"])[:3]:
-                lines.append(f"  {d['delta']} {d['name']} ({d['age']}): {d['reasoning']}")
         lines.append("")
     return "\n".join(lines)
@@ -191,6 +273,8 @@ def main():
     parser = argparse.ArgumentParser()
     parser.add_argument("--tag", required=True)
     parser.add_argument("--changes", required=True, help="JSON file with changes to probe")
     parser.add_argument("--min-score", type=int, default=4)
     parser.add_argument("--max-score", type=int, default=7)
     parser.add_argument("--parallel", type=int, default=5)
@@ -223,7 +307,11 @@ def main():
     model = os.getenv("LLM_MODEL_NAME")
     print(f"Movable middle (score {args.min_score}-{args.max_score}): {len(movable)}")
-    print(f"Changes: {len(all_changes)} | Model: {model}\n")
     results = [None] * len(movable)
     done = [0]
@@ -255,7 +343,18 @@ def main():
     with open(out_dir / "raw_probes.json", "w") as f:
         json.dump(results, f, ensure_ascii=False, indent=2)
-    gradient = analyze_gradient(results, all_changes)
     with open(out_dir / "gradient.md", "w") as f:
         f.write(gradient)

         return {"error": str(e), "_evaluator": ev}
+GOAL_RELEVANCE_PROMPT = """You are scoring how relevant an evaluator is to a specific goal.
+## Goal
+{goal}
+## Evaluator
+Name: {name}, Age: {age}, Occupation: {occupation}
+Their evaluation: {score}/10 — "{summary}"
+## Task
+On a scale of 0.0 to 1.0, how relevant is this evaluator's opinion to the stated goal?
+- 1.0 = this is exactly the kind of person whose opinion matters for this goal
+- 0.5 = somewhat relevant
+- 0.0 = completely irrelevant to this goal
+Return JSON only: {{"relevance": <0.0-1.0>, "reasoning": "<1 sentence>"}}"""
+def compute_goal_weights(client, model, eval_results, cohort_map, goal, parallel=5):
+    """Score each evaluator's relevance to the goal. Returns {name: weight}."""
+    weights = {}
+    def score_one(r):
+        ev = r.get("_evaluator", {})
+        name = ev.get("name", "")
+        persona = cohort_map.get(name, {})
+        prompt = GOAL_RELEVANCE_PROMPT.format(
+            goal=goal, name=name, age=ev.get("age", ""),
+            occupation=ev.get("occupation", ""),
+            score=r.get("score", "?"),
+            summary=r.get("summary", r.get("reasoning", "")),
+        )
+        try:
+            resp = client.chat.completions.create(
+                model=model,
+                messages=[{"role": "user", "content": prompt}],
+                response_format={"type": "json_object"},
+                max_tokens=256, temperature=0.3,
+            )
+            content = resp.choices[0].message.content
+            content = re.sub(r'<think>[\s\S]*?</think>', '', content).strip()
+            data = json.loads(content)
+            return name, float(data.get("relevance", 0.5)), data.get("reasoning", "")
+        except Exception:
+            return name, 0.5, "default"
+    with concurrent.futures.ThreadPoolExecutor(max_workers=parallel) as pool:
+        futs = [pool.submit(score_one, r) for r in eval_results]
+        for fut in concurrent.futures.as_completed(futs):
+            name, weight, reasoning = fut.result()
+            weights[name] = {"weight": weight, "reasoning": reasoning}
+    return weights
+def analyze_gradient(results, all_changes, goal_weights=None):
     valid = [r for r in results if "counterfactuals" in r]
     if not valid:
         return "No valid results."
+    has_goal = goal_weights is not None
     labels = {c["id"]: c["label"] for c in all_changes}
     jacobian = defaultdict(list)
     for r in valid:
+        name = r["_evaluator"].get("name", "")
+        w = goal_weights.get(name, {}).get("weight", 1.0) if has_goal else 1.0
         for cf in r.get("counterfactuals", []):
             jacobian[cf.get("change_id", "")].append({
                 "delta": cf.get("delta", 0),
+                "weighted_delta": cf.get("delta", 0) * w,
+                "weight": w,
+                "name": name,
                 "age": r["_evaluator"].get("age", ""),
                 "reasoning": cf.get("reasoning", ""),
             })
     ranked = []
     for cid, deltas in jacobian.items():
+        total_weight = sum(d["weight"] for d in deltas)
+        if total_weight == 0:
+            total_weight = 1
+        weighted_avg = sum(d["weighted_delta"] for d in deltas) / total_weight
+        raw_avg = sum(d["delta"] for d in deltas) / len(deltas)
         ranked.append({
             "id": cid, "label": labels.get(cid, cid),
+            "avg_delta": weighted_avg,
+            "raw_avg_delta": raw_avg,
             "max_delta": max(d["delta"] for d in deltas),
             "min_delta": min(d["delta"] for d in deltas),
             "positive": sum(1 for d in deltas if d["delta"] > 0),
         })
     ranked.sort(key=lambda x: x["avg_delta"], reverse=True)
+    mode = "Goal-Weighted (VJP)" if has_goal else "Uniform"
+    lines = [f"# Semantic Gradient ({mode})\n\nProbed {len(valid)} evaluators across {len(all_changes)} changes.\n"]
+    if has_goal:
+        header = f"{'Rank':<5} {'VJP Δ':>6} {'Raw Δ':>6} {'Max':>5} {'Min':>5}  Change"
+    else:
+        header = f"{'Rank':<5} {'Avg Δ':>6} {'Max':>5} {'Min':>5} {'👍':>4} {'👎':>4}  Change"
+    lines.append(header)
     lines.append("-" * 75)
     for i, r in enumerate(ranked, 1):
+        if has_goal:
+            lines.append(
+                f"{i:<5} {r['avg_delta']:>+5.1f}  {r['raw_avg_delta']:>+5.1f}  "
+                f"{r['max_delta']:>+4}  {r['min_delta']:>+4}   {r['label']}"
+            )
+        else:
+            lines.append(
+                f"{i:<5} {r['avg_delta']:>+5.1f}  {r['max_delta']:>+4}  {r['min_delta']:>+4}  "
+                f"{r['positive']:>3}  {r['negative']:>3}   {r['label']}"
+            )
     lines.append(f"\n## Top 3 — Detail\n")
     for r in ranked[:3]:
+        label = f"### {r['label']} (Δ {r['avg_delta']:+.1f})"
+        if has_goal and abs(r['avg_delta'] - r['raw_avg_delta']) > 0.2:
+            label += f"  ← was {r['raw_avg_delta']:+.1f} without goal weighting"
+        lines.append(label + "\n")
         positive = sorted([d for d in r["details"] if d["delta"] > 0],
+                          key=lambda x: x["weighted_delta"] if has_goal else x["delta"],
+                          reverse=True)
         if positive:
             lines.append("**Helps:**")
             for d in positive[:5]:
+                w_label = f" [w={d['weight']:.1f}]" if has_goal else ""
+                lines.append(f"  +{d['delta']} {d['name']} ({d['age']}){w_label}: {d['reasoning']}")
         negative = [d for d in r["details"] if d["delta"] < 0]
         if negative:
             lines.append("**Hurts:**")
             for d in sorted(negative, key=lambda x: x["delta"])[:3]:
+                w_label = f" [w={d['weight']:.1f}]" if has_goal else ""
+                lines.append(f"  {d['delta']} {d['name']} ({d['age']}){w_label}: {d['reasoning']}")
         lines.append("")
     return "\n".join(lines)
     parser = argparse.ArgumentParser()
     parser.add_argument("--tag", required=True)
     parser.add_argument("--changes", required=True, help="JSON file with changes to probe")
+    parser.add_argument("--goal", default=None,
+                        help="Goal to optimize toward (enables VJP weighting)")
     parser.add_argument("--min-score", type=int, default=4)
     parser.add_argument("--max-score", type=int, default=7)
     parser.add_argument("--parallel", type=int, default=5)
     model = os.getenv("LLM_MODEL_NAME")
     print(f"Movable middle (score {args.min_score}-{args.max_score}): {len(movable)}")
+    print(f"Changes: {len(all_changes)} | Model: {model}")
+    if args.goal:
+        print(f"Goal: {args.goal} (VJP mode)\n")
+    else:
+        print("No goal — uniform weighting\n")
     results = [None] * len(movable)
     done = [0]
     with open(out_dir / "raw_probes.json", "w") as f:
         json.dump(results, f, ensure_ascii=False, indent=2)
+    # Compute goal weights if goal is specified (VJP)
+    goal_weights = None
+    if args.goal:
+        print("Computing goal-relevance weights...")
+        goal_weights = compute_goal_weights(
+            client, model, eval_results, cohort_map, args.goal,
+            parallel=args.parallel,
+        )
+        relevant = sum(1 for v in goal_weights.values() if v["weight"] >= 0.5)
+        print(f"  {relevant}/{len(goal_weights)} evaluators relevant to goal\n")
+    gradient = analyze_gradient(results, all_changes, goal_weights=goal_weights)
     with open(out_dir / "gradient.md", "w") as f:
         f.write(gradient)

web/app.py CHANGED Viewed

@@ -419,10 +419,10 @@ async def evaluate_stream(sid: str, parallel: int = 5, bias_calibration: bool =
 @app.get("/api/counterfactual/stream/{sid}")
 async def counterfactual_stream(
-    sid: str, changes_json: str, min_score: int = 4,
-    max_score: int = 7, parallel: int = 5
 ):
-    """Run counterfactual probes with SSE progress."""
     if sid not in sessions:
         raise HTTPException(404, "Session not found")
     session = sessions[sid]
@@ -442,8 +442,10 @@ async def counterfactual_stream(
                    if "score" in r and min_score <= r["score"] <= max_score]
         total = len(movable)
         yield {"event": "start", "data": json.dumps({
-            "total": total, "changes": len(all_changes), "model": model
         })}
         if total == 0:
@@ -454,6 +456,23 @@ async def counterfactual_stream(
             })}
             return
         results = [None] * total
         done = 0
         t0 = time.time()
@@ -484,13 +503,15 @@ async def counterfactual_stream(
                 yield {"event": "progress", "data": json.dumps(progress)}
         elapsed = time.time() - t0
-        gradient_text = analyze_gradient(results, all_changes)
         session["gradient"] = gradient_text
         yield {"event": "complete", "data": json.dumps({
             "elapsed": round(elapsed, 1),
             "gradient": gradient_text,
             "results": results,
         })}
     return EventSourceResponse(event_generator())

 @app.get("/api/counterfactual/stream/{sid}")
 async def counterfactual_stream(
+    sid: str, changes_json: str, goal: str = "",
+    min_score: int = 4, max_score: int = 7, parallel: int = 5
 ):
+    """Run counterfactual probes with SSE progress. Goal enables VJP weighting."""
     if sid not in sessions:
         raise HTTPException(404, "Session not found")
     session = sessions[sid]
                    if "score" in r and min_score <= r["score"] <= max_score]
         total = len(movable)
+        has_goal = bool(goal.strip())
         yield {"event": "start", "data": json.dumps({
+            "total": total, "changes": len(all_changes), "model": model,
+            "goal": goal if has_goal else None,
         })}
         if total == 0:
             })}
             return
+        # Compute goal-relevance weights (VJP) if goal is set
+        goal_weights = None
+        if has_goal:
+            yield {"event": "goal_weights", "data": json.dumps({
+                "status": "computing", "message": "Scoring evaluator relevance to goal..."
+            })}
+            goal_weights = compute_goal_weights(
+                client, model, eval_results, cohort_map, goal, parallel=parallel,
+            )
+            relevant = sum(1 for v in goal_weights.values() if v["weight"] >= 0.5)
+            yield {"event": "goal_weights", "data": json.dumps({
+                "status": "done",
+                "relevant": relevant,
+                "total": len(goal_weights),
+                "message": f"{relevant}/{len(goal_weights)} evaluators relevant to goal",
+            })}
         results = [None] * total
         done = 0
         t0 = time.time()
                 yield {"event": "progress", "data": json.dumps(progress)}
         elapsed = time.time() - t0
+        gradient_text = analyze_gradient(results, all_changes,
+                                         goal_weights=goal_weights)
         session["gradient"] = gradient_text
         yield {"event": "complete", "data": json.dumps({
             "elapsed": round(elapsed, 1),
             "gradient": gradient_text,
             "results": results,
+            "goal": goal if has_goal else None,
         })}
     return EventSourceResponse(event_generator())

web/static/index.html CHANGED Viewed

@@ -348,6 +348,11 @@
       <textarea id="entityText" placeholder="Paste your entity here..."></textarea>
     </div>
     <details class="mb-8">
       <summary style="cursor:pointer;color:var(--text2);font-size:0.85rem">Advanced options</summary>
       <div style="padding:12px 0">
@@ -861,8 +866,10 @@ function runCounterfactual() {
   document.getElementById('cfResults').classList.add('hidden');
   document.getElementById('cfLog').innerHTML = '';
   const params = new URLSearchParams({
     changes_json: JSON.stringify(changes),
     min_score: minScore,
     max_score: maxScore,
     parallel: 5,
@@ -872,8 +879,14 @@ function runCounterfactual() {
   es.addEventListener('start', (e) => {
     const d = JSON.parse(e.data);
     document.getElementById('cfProgressText').textContent =
-      `Probing ${d.total} evaluators across ${d.changes} changes...`;
   });
   es.addEventListener('progress', (e) => {

       <textarea id="entityText" placeholder="Paste your entity here..."></textarea>
     </div>
+    <div class="field">
+      <label>What's your goal?</label>
+      <input type="text" id="goalText" placeholder="e.g. 'Get hired at a Series B startup' or 'Close enterprise deals'">
+    </div>
     <details class="mb-8">
       <summary style="cursor:pointer;color:var(--text2);font-size:0.85rem">Advanced options</summary>
       <div style="padding:12px 0">
   document.getElementById('cfResults').classList.add('hidden');
   document.getElementById('cfLog').innerHTML = '';
+  const goal = document.getElementById('goalText').value.trim();
   const params = new URLSearchParams({
     changes_json: JSON.stringify(changes),
+    goal: goal,
     min_score: minScore,
     max_score: maxScore,
     parallel: 5,
   es.addEventListener('start', (e) => {
     const d = JSON.parse(e.data);
+    const goalLabel = d.goal ? ` toward "${d.goal}"` : '';
     document.getElementById('cfProgressText').textContent =
+      `Probing ${d.total} evaluators across ${d.changes} changes${goalLabel}...`;
+  });
+  es.addEventListener('goal_weights', (e) => {
+    const d = JSON.parse(e.data);
+    document.getElementById('cfProgressText').textContent = d.message;
   });
   es.addEventListener('progress', (e) => {