Eric Xu commited on
Add bias audit to SKILL/AGENT system and end-to-end demo
Browse files- Update SKILL.md: add Phase 6 (bias audit), document --bias-calibration flag
- Update AGENT.md: add Phase 6, update decision tree, update file layout
- Add examples/: CodeReview AI entity, counterfactual changes, and run_demo.sh
that walks through the full pipeline including bias audit and calibration
- AGENT.md +38 -1
- SKILL.md +38 -0
- examples/changes_codereview_ai.json +27 -0
- examples/entity_codereview_ai.md +38 -0
- examples/run_demo.sh +134 -0
AGENT.md
CHANGED
|
@@ -113,6 +113,8 @@ uv run python scripts/evaluate.py \
|
|
| 113 |
--parallel 5
|
| 114 |
```
|
| 115 |
|
|
|
|
|
|
|
| 116 |
**Present results to the user**:
|
| 117 |
|
| 118 |
1. Overall score distribution (avg, positive %, negative %)
|
|
@@ -183,6 +185,34 @@ Repeat until the user is satisfied or diminishing returns are clear.
|
|
| 183 |
|
| 184 |
---
|
| 185 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
## Decision Tree
|
| 187 |
|
| 188 |
```
|
|
@@ -212,6 +242,12 @@ User wants optimization?
|
|
| 212 |
User made changes?
|
| 213 |
├─ Yes → Phase 5: re-evaluate, compare
|
| 214 |
└─ No → done
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 215 |
```
|
| 216 |
|
| 217 |
---
|
|
@@ -230,8 +266,9 @@ User made changes?
|
|
| 230 |
│ ├── persona_loader.py # Load + filter personas
|
| 231 |
│ ├── stratified_sampler.py
|
| 232 |
│ ├── generate_cohort.py # LLM-generate personas when no dataset fits
|
| 233 |
-
│ ├── evaluate.py # f(θ, x) scorer
|
| 234 |
│ ├── counterfactual.py # Semantic gradient probe
|
|
|
|
| 235 |
│ └── compare.py # Cross-run diff
|
| 236 |
├── templates/
|
| 237 |
│ ├── entity_product.md
|
|
|
|
| 113 |
--parallel 5
|
| 114 |
```
|
| 115 |
|
| 116 |
+
Add `--bias-calibration` to inject CoBRA-inspired bias calibration instructions that reduce framing, authority, and order artifacts for more realistic evaluations.
|
| 117 |
+
|
| 118 |
**Present results to the user**:
|
| 119 |
|
| 120 |
1. Overall score distribution (avg, positive %, negative %)
|
|
|
|
| 185 |
|
| 186 |
---
|
| 187 |
|
| 188 |
+
## Phase 6 — Bias Audit (Optional)
|
| 189 |
+
|
| 190 |
+
Run when the user questions evaluation fidelity, or proactively after the first evaluation to establish a baseline.
|
| 191 |
+
|
| 192 |
+
```bash
|
| 193 |
+
uv run python scripts/bias_audit.py \
|
| 194 |
+
--entity entities/<name>.md \
|
| 195 |
+
--cohort data/cohort.json \
|
| 196 |
+
--probes framing authority order \
|
| 197 |
+
--sample 10 \
|
| 198 |
+
--parallel 5
|
| 199 |
+
```
|
| 200 |
+
|
| 201 |
+
This runs CoBRA-inspired experiments (arXiv:2509.13588) through SGO's pipeline:
|
| 202 |
+
|
| 203 |
+
- **Framing probe**: Same entity rewritten with gain vs. loss framing → measures if LLM evaluators are over/under-sensitive vs. the ~30% human baseline (Tversky & Kahneman, 1981)
|
| 204 |
+
- **Authority probe**: Entity with/without credibility signals → measures authority bias vs. ~20% human baseline
|
| 205 |
+
- **Order probe**: Sections reordered → measures anchoring effects (should be ~0%)
|
| 206 |
+
|
| 207 |
+
**Present**: Per-probe shift %, comparison to human baselines, overall assessment (over-biased / under-biased / well-calibrated).
|
| 208 |
+
|
| 209 |
+
**If over-biased**: Suggest re-running evaluation with `--bias-calibration` flag.
|
| 210 |
+
**If under-biased**: Note that the panel may be more rational than real humans — this may be acceptable or not depending on the domain.
|
| 211 |
+
|
| 212 |
+
**Ask**: *"Your panel shows [X]% framing sensitivity (human baseline: ~30%). Want to run with bias calibration enabled?"*
|
| 213 |
+
|
| 214 |
+
---
|
| 215 |
+
|
| 216 |
## Decision Tree
|
| 217 |
|
| 218 |
```
|
|
|
|
| 242 |
User made changes?
|
| 243 |
├─ Yes → Phase 5: re-evaluate, compare
|
| 244 |
└─ No → done
|
| 245 |
+
│
|
| 246 |
+
▼
|
| 247 |
+
User questions fidelity / wants validation?
|
| 248 |
+
├─ Yes → Phase 6: bias audit
|
| 249 |
+
│ └─ Over-biased? → re-run with --bias-calibration
|
| 250 |
+
└─ No → done
|
| 251 |
```
|
| 252 |
|
| 253 |
---
|
|
|
|
| 266 |
│ ├── persona_loader.py # Load + filter personas
|
| 267 |
│ ├── stratified_sampler.py
|
| 268 |
│ ├── generate_cohort.py # LLM-generate personas when no dataset fits
|
| 269 |
+
│ ├── evaluate.py # f(θ, x) scorer (supports --bias-calibration)
|
| 270 |
│ ├── counterfactual.py # Semantic gradient probe
|
| 271 |
+
│ ├── bias_audit.py # CoBRA-inspired cognitive bias measurement
|
| 272 |
│ └── compare.py # Cross-run diff
|
| 273 |
├── templates/
|
| 274 |
│ ├── entity_product.md
|
SKILL.md
CHANGED
|
@@ -102,6 +102,16 @@ uv run python scripts/evaluate.py \
|
|
| 102 |
--parallel 5
|
| 103 |
```
|
| 104 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
Present: avg score, breakdown by segment, top attractions, top concerns, dealbreakers, most/least receptive evaluators with quotes.
|
| 106 |
|
| 107 |
Ask: **"Anything surprising? Want to dig into a segment?"**
|
|
@@ -137,6 +147,34 @@ Ask: **"Which change do you want to make first?"**
|
|
| 137 |
|
| 138 |
---
|
| 139 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
## Key Principles
|
| 141 |
|
| 142 |
- **Cohort is the control group** — keep it fixed across runs
|
|
|
|
| 102 |
--parallel 5
|
| 103 |
```
|
| 104 |
|
| 105 |
+
To enable bias calibration (reduces framing/authority/order artifacts for more realistic scores):
|
| 106 |
+
```bash
|
| 107 |
+
uv run python scripts/evaluate.py \
|
| 108 |
+
--entity entities/<name>.md \
|
| 109 |
+
--cohort data/cohort.json \
|
| 110 |
+
--tag <run_tag> \
|
| 111 |
+
--bias-calibration \
|
| 112 |
+
--parallel 5
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
Present: avg score, breakdown by segment, top attractions, top concerns, dealbreakers, most/least receptive evaluators with quotes.
|
| 116 |
|
| 117 |
Ask: **"Anything surprising? Want to dig into a segment?"**
|
|
|
|
| 147 |
|
| 148 |
---
|
| 149 |
|
| 150 |
+
## Phase 6 — Bias Audit (Optional)
|
| 151 |
+
|
| 152 |
+
Run when the user wants to validate panel fidelity or asks "how realistic are these evaluations?" This measures cognitive biases in the evaluator pipeline and compares to human baselines (Tversky & Kahneman framing, Milgram authority).
|
| 153 |
+
|
| 154 |
+
```bash
|
| 155 |
+
cd $SGO_DIR
|
| 156 |
+
uv run python scripts/bias_audit.py \
|
| 157 |
+
--entity entities/<name>.md \
|
| 158 |
+
--cohort data/cohort.json \
|
| 159 |
+
--probes framing authority order \
|
| 160 |
+
--sample 10 \
|
| 161 |
+
--parallel 5
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
- `--probes`: which biases to test (framing, authority, order — or any subset)
|
| 165 |
+
- `--sample`: number of evaluators to audit (10 is fast; use full cohort for thorough audit)
|
| 166 |
+
|
| 167 |
+
Output: `results/bias_audit/report.md` with per-probe analysis and gap vs. human baselines.
|
| 168 |
+
|
| 169 |
+
If biases are detected:
|
| 170 |
+
- **Over-biased**: Re-run evaluation with `--bias-calibration` flag
|
| 171 |
+
- **Under-biased**: Consider if the panel is too rational for the domain
|
| 172 |
+
- **Order effects**: Standardize entity format or average across orderings
|
| 173 |
+
|
| 174 |
+
Ask: **"Want to see how your panel's cognitive biases compare to human baselines?"**
|
| 175 |
+
|
| 176 |
+
---
|
| 177 |
+
|
| 178 |
## Key Principles
|
| 179 |
|
| 180 |
- **Cohort is the control group** — keep it fixed across runs
|
examples/changes_codereview_ai.json
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"id": "free_tier",
|
| 4 |
+
"label": "Add free tier",
|
| 5 |
+
"description": "Add a free tier: 1 repo, 20 reviews/month, 1 user. No credit card required. This lets developers try the product before committing budget."
|
| 6 |
+
},
|
| 7 |
+
{
|
| 8 |
+
"id": "soc2",
|
| 9 |
+
"label": "Get SOC 2 certified",
|
| 10 |
+
"description": "Achieve SOC 2 Type II certification. Display the badge prominently. This signals enterprise-grade security practices and is often a procurement requirement."
|
| 11 |
+
},
|
| 12 |
+
{
|
| 13 |
+
"id": "self_hosted",
|
| 14 |
+
"label": "Add self-hosted option",
|
| 15 |
+
"description": "Offer a self-hosted deployment option for Enterprise tier. Code never leaves the customer's infrastructure. Available as Docker image or Kubernetes helm chart."
|
| 16 |
+
},
|
| 17 |
+
{
|
| 18 |
+
"id": "customer_logos",
|
| 19 |
+
"label": "Add recognizable customer logos",
|
| 20 |
+
"description": "Add logos of 3-5 well-known companies using the product. Include brief case studies: 'Acme Corp reduced review time by 60%' style social proof."
|
| 21 |
+
},
|
| 22 |
+
{
|
| 23 |
+
"id": "lower_price",
|
| 24 |
+
"label": "Drop Team price to $69/mo",
|
| 25 |
+
"description": "Reduce Team tier from $99/mo to $69/mo. Keep all features the same. This brings the per-seat cost below the psychological $7/user threshold for a 10-person team."
|
| 26 |
+
}
|
| 27 |
+
]
|
examples/entity_codereview_ai.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CodeReview AI
|
| 2 |
+
|
| 3 |
+
## One-liner
|
| 4 |
+
|
| 5 |
+
AI-powered code review that catches bugs, security issues, and style violations before your team does.
|
| 6 |
+
|
| 7 |
+
## Key features
|
| 8 |
+
|
| 9 |
+
- **Automated PR review**: Analyzes every pull request in under 60 seconds
|
| 10 |
+
- **Security scanning**: Detects OWASP Top 10 vulnerabilities, hardcoded secrets, and dependency risks
|
| 11 |
+
- **Style enforcement**: Configurable rules matching your team's coding standards
|
| 12 |
+
- **Multi-language**: Python, TypeScript, Go, Rust, Java — with framework-aware analysis
|
| 13 |
+
- **IDE integration**: VS Code and JetBrains plugins for real-time feedback while coding
|
| 14 |
+
|
| 15 |
+
## Pricing
|
| 16 |
+
|
| 17 |
+
- **Starter**: $29/mo — 1 repo, 100 reviews/mo, 2 team members
|
| 18 |
+
- **Team**: $99/mo — 10 repos, unlimited reviews, 15 team members
|
| 19 |
+
- **Enterprise**: Custom pricing — unlimited repos, SSO, SLA, dedicated support
|
| 20 |
+
|
| 21 |
+
## Trust signals
|
| 22 |
+
|
| 23 |
+
- Used by 340 development teams
|
| 24 |
+
- Founded by ex-Google and ex-Stripe engineers
|
| 25 |
+
- 12 months in production
|
| 26 |
+
- Average review time: 47 seconds
|
| 27 |
+
|
| 28 |
+
## Target user
|
| 29 |
+
|
| 30 |
+
Software development teams (3-50 engineers) who want faster, more consistent code review without slowing down their merge cadence.
|
| 31 |
+
|
| 32 |
+
## What's NOT included
|
| 33 |
+
|
| 34 |
+
- No SOC 2 certification yet (in progress, expected Q3)
|
| 35 |
+
- No self-hosted option (cloud-only)
|
| 36 |
+
- No free tier
|
| 37 |
+
- No support for C/C++ or legacy languages
|
| 38 |
+
- No SAML SSO on Starter/Team plans
|
examples/run_demo.sh
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# ──────────────────────────────────────────────────────────────────────────────
|
| 3 |
+
# SGO End-to-End Demo — CodeReview AI
|
| 4 |
+
#
|
| 5 |
+
# Demonstrates the full pipeline: entity → cohort → evaluate → counterfactual
|
| 6 |
+
# probe → bias audit → bias-calibrated re-evaluation.
|
| 7 |
+
#
|
| 8 |
+
# Prerequisites:
|
| 9 |
+
# 1. cd <sgo-root> && uv sync
|
| 10 |
+
# 2. cp .env.example .env (fill in your LLM API key)
|
| 11 |
+
# 3. uv run python scripts/setup_data.py (download Nemotron personas, once)
|
| 12 |
+
#
|
| 13 |
+
# Usage:
|
| 14 |
+
# cd <sgo-root>
|
| 15 |
+
# bash examples/run_demo.sh
|
| 16 |
+
# ──────────────────────────────────────────────────────────────────────────────
|
| 17 |
+
|
| 18 |
+
set -euo pipefail
|
| 19 |
+
|
| 20 |
+
SGO_DIR="$(cd "$(dirname "$0")/.." && pwd)"
|
| 21 |
+
cd "$SGO_DIR"
|
| 22 |
+
|
| 23 |
+
ENTITY="examples/entity_codereview_ai.md"
|
| 24 |
+
CHANGES="examples/changes_codereview_ai.json"
|
| 25 |
+
COHORT="data/demo_cohort.json"
|
| 26 |
+
TAG="demo_baseline"
|
| 27 |
+
TAG_CAL="demo_calibrated"
|
| 28 |
+
SAMPLE=50
|
| 29 |
+
AUDIT_SAMPLE=10
|
| 30 |
+
PARALLEL=5
|
| 31 |
+
|
| 32 |
+
echo "═══════════════════════════════════════════════════════════════"
|
| 33 |
+
echo " SGO End-to-End Demo: CodeReview AI"
|
| 34 |
+
echo "═══════════════════════════════════════════════════════════════"
|
| 35 |
+
|
| 36 |
+
# ── Phase 1: Entity already exists at examples/entity_codereview_ai.md ───
|
| 37 |
+
|
| 38 |
+
echo ""
|
| 39 |
+
echo "Phase 1 — Entity: $ENTITY"
|
| 40 |
+
echo "─────────────────────────────────────────────────────────────"
|
| 41 |
+
head -3 "$ENTITY"
|
| 42 |
+
echo "..."
|
| 43 |
+
echo ""
|
| 44 |
+
|
| 45 |
+
# ── Phase 2: Build cohort ────────────────────────────────────────────────
|
| 46 |
+
|
| 47 |
+
echo "Phase 2 — Building evaluator cohort ($SAMPLE personas)"
|
| 48 |
+
echo "─────────────────────────────────────────────────────────────"
|
| 49 |
+
|
| 50 |
+
# Filter: US adults 25-55 to get a broad software buyer population
|
| 51 |
+
uv run python scripts/persona_loader.py \
|
| 52 |
+
--filters '{"age_min": 25, "age_max": 55}' \
|
| 53 |
+
--output data/demo_filtered.json
|
| 54 |
+
|
| 55 |
+
# Stratified sample with entity-aware occupation bucketing
|
| 56 |
+
uv run python scripts/stratified_sampler.py \
|
| 57 |
+
--input data/demo_filtered.json \
|
| 58 |
+
--entity "$ENTITY" \
|
| 59 |
+
--total "$SAMPLE" \
|
| 60 |
+
--output "$COHORT"
|
| 61 |
+
|
| 62 |
+
echo ""
|
| 63 |
+
|
| 64 |
+
# ── Phase 3: Evaluate (baseline, no bias calibration) ───────────────────
|
| 65 |
+
|
| 66 |
+
echo "Phase 3 — Evaluating (baseline, no bias calibration)"
|
| 67 |
+
echo "─────────────────────────────────────────────────────────────"
|
| 68 |
+
|
| 69 |
+
uv run python scripts/evaluate.py \
|
| 70 |
+
--entity "$ENTITY" \
|
| 71 |
+
--cohort "$COHORT" \
|
| 72 |
+
--tag "$TAG" \
|
| 73 |
+
--parallel "$PARALLEL"
|
| 74 |
+
|
| 75 |
+
echo ""
|
| 76 |
+
|
| 77 |
+
# ── Phase 4: Counterfactual probe ────────────────────────────────────────
|
| 78 |
+
|
| 79 |
+
echo "Phase 4 — Counterfactual probe (semantic gradient)"
|
| 80 |
+
echo "─────────────────────────────────────────────────────────────"
|
| 81 |
+
|
| 82 |
+
uv run python scripts/counterfactual.py \
|
| 83 |
+
--tag "$TAG" \
|
| 84 |
+
--changes "$CHANGES" \
|
| 85 |
+
--parallel "$PARALLEL"
|
| 86 |
+
|
| 87 |
+
echo ""
|
| 88 |
+
|
| 89 |
+
# ── Phase 6: Bias audit ─────────────────────────────────────────────────
|
| 90 |
+
|
| 91 |
+
echo "Phase 6 — Bias Audit (CoBRA-inspired, arXiv:2509.13588)"
|
| 92 |
+
echo "─────────────────────────────────────────────────────────────"
|
| 93 |
+
echo "Running framing, authority, and order probes on $AUDIT_SAMPLE evaluators..."
|
| 94 |
+
|
| 95 |
+
uv run python scripts/bias_audit.py \
|
| 96 |
+
--entity "$ENTITY" \
|
| 97 |
+
--cohort "$COHORT" \
|
| 98 |
+
--probes framing authority order \
|
| 99 |
+
--sample "$AUDIT_SAMPLE" \
|
| 100 |
+
--parallel "$PARALLEL"
|
| 101 |
+
|
| 102 |
+
echo ""
|
| 103 |
+
|
| 104 |
+
# ── Phase 3 (re-run): Evaluate with bias calibration ────────────────────
|
| 105 |
+
|
| 106 |
+
echo "Phase 3 (re-run) — Evaluating with --bias-calibration"
|
| 107 |
+
echo "─────────────────────────────────────────────────────────────"
|
| 108 |
+
|
| 109 |
+
uv run python scripts/evaluate.py \
|
| 110 |
+
--entity "$ENTITY" \
|
| 111 |
+
--cohort "$COHORT" \
|
| 112 |
+
--tag "$TAG_CAL" \
|
| 113 |
+
--bias-calibration \
|
| 114 |
+
--parallel "$PARALLEL"
|
| 115 |
+
|
| 116 |
+
echo ""
|
| 117 |
+
|
| 118 |
+
# ── Phase 5: Compare baseline vs. calibrated ────────────────────────────
|
| 119 |
+
|
| 120 |
+
echo "Phase 5 — Comparing baseline vs. bias-calibrated"
|
| 121 |
+
echo "─────────────────────────────────────────────────────────────"
|
| 122 |
+
|
| 123 |
+
uv run python scripts/compare.py --runs "$TAG" "$TAG_CAL"
|
| 124 |
+
|
| 125 |
+
echo ""
|
| 126 |
+
echo "═══════════════════════════════════════════════════════════════"
|
| 127 |
+
echo " Demo complete!"
|
| 128 |
+
echo ""
|
| 129 |
+
echo " Results:"
|
| 130 |
+
echo " Baseline: results/$TAG/analysis.md"
|
| 131 |
+
echo " Gradient: results/$TAG/counterfactual/gradient.md"
|
| 132 |
+
echo " Bias audit: results/bias_audit/report.md"
|
| 133 |
+
echo " Calibrated: results/$TAG_CAL/analysis.md"
|
| 134 |
+
echo "═══════════════════════════════════════════════════════════════"
|