MetaDebate / prompts /update-data.md
vajeeda's picture
final mvp created
98b952a
# Update Website with Real Training Data
> Paste this into Claude Code.
You are updating the Viral Script Debugging Engine website to display real GRPO training results instead of mock data.
**Current state:**
- Web UI has hardcoded placeholder metrics
- Charts show synthetic data
- Reward component bars are mock values
**What to update:**
Replace all mock data with real values from your training run. The training data comes from:
- `logs/training_results.json` β€” full training metrics
- `logs/baseline_results.json` β€” baseline before training
- Image files: `logs/training_vs_baseline.png`, `logs/baseline_reward_curves.png`, etc.
**Real numbers to use:**
```json
{
"baseline": {
"r1_hook": 0.42,
"r2_coherence": 0.59,
"r3_cultural": 0.61,
"r4_debate": 0.39,
"r5_preserve": 0.51,
"r6_safety": 0.50,
"r7_originality": 0.50,
"r8_persona": 0.45,
"r9_pacing": 0.52,
"r10_retention": 0.40,
"total_reward": 0.51
},
"trained": {
"r1_hook": 0.71,
"r2_coherence": 0.75,
"r3_cultural": 0.82,
"r4_debate": 0.80,
"r5_preserve": 0.76,
"r6_safety": 0.78,
"r7_originality": 0.79,
"r8_persona": 0.82,
"r9_pacing": 0.77,
"r10_retention": 0.86,
"total_reward": 0.78
},
"improvements": {
"r1_hook": "+29%",
"r2_coherence": "+16%",
"r3_cultural": "+21%",
"r4_debate": "+41%",
"r5_preserve": "+25%",
"r6_safety": "+28%",
"r7_originality": "+29%",
"r8_persona": "+37%",
"r9_pacing": "+25%",
"r10_retention": "+46%",
"total_reward": "+27%"
},
"retention_curve": {
"before_dropoff_point": "6 seconds",
"after_dropoff_point": "20 seconds",
"improvement_factor": "3x"
}
}
```
**Files to update:**
1. **`web_ui/components/RewardBars.tsx`**
- Replace mock baseline values with real baseline (0.42, 0.59, 0.61, etc.)
- Replace mock trained values with real trained (0.71, 0.75, 0.82, etc.)
- Show delta percentages: +29%, +16%, +21%, etc.
- Add tooltip: "Baseline (gray) vs Trained (blue)"
2. **`web_ui/app/learning/page.tsx`** (Learning Playback)
- Replace mock reward curve with real data
- X-axis: episodes 1–100
- Y-axis: total reward 0–1
- Grey line: baseline constant at ~0.51
- Blue line: trained improving from 0.50 β†’ 0.78
- Show data points at key episodes (10, 25, 50, 75, 100)
3. **`web_ui/app/retention/page.tsx`** (Retention Chart)
- Replace mock retention curve
- Before: steep drop from 100% β†’ 20% by 6s
- After: gradual drop from 100% β†’ 50% by 20s
- Highlight the "drop-off shift: 6s β†’ 20s" annotation
- Show AUC before/after in a summary card
4. **`web_ui/components/LearningGraph.tsx`**
- Replace mock episode-by-episode data
- Real progression: baseline flat at 0.51, trained curves showing improvement trajectory
- Episodes: 0–100
- Reward: 0–1
5. **`web_ui/app/dashboard/page.tsx`** (System Overview)
- Top metric card: "Total Reward Improvement: +27%"
- Secondary cards: "Best Improvement: R10 Retention (+46%)"
- Stats: "200 training steps", "10 reward signals", "Qwen2.5-7B model"
- Timeline: "Training took ~90 minutes on T4 GPU"
6. **`web_ui/app/page.tsx`** (Home Page)
- Hero section: Update headline metrics
- "Trained Arbitrator: 0.78 avg reward (+27% improvement)"
- "Retention improvement: 3Γ— longer viewer engagement"
- "All 10 reward signals improved 16–46%"
**Implementation approach:**
Option A (Simple): Hardcode the real values directly into React components
```tsx
// Before (mock):
const baselineRewards = {
r1: 0.50,
r2: 0.50,
// ...
};
// After (real):
const baselineRewards = {
r1: 0.42,
r2: 0.59,
r3: 0.61,
r4: 0.39,
r5: 0.51,
r6: 0.50,
r7: 0.50,
r8: 0.45,
r9: 0.52,
r10: 0.40,
};
const trainedRewards = {
r1: 0.71,
r2: 0.75,
r3: 0.82,
r4: 0.80,
r5: 0.76,
r6: 0.78,
r7: 0.79,
r8: 0.82,
r9: 0.77,
r10: 0.86,
};
```
Option B (Better): Load from a JSON config file
```tsx
// Create: web_ui/public/training_results.json
// Import and use:
const { baseline, trained, improvements } = require('/public/training_results.json');
```
**Charts to update (Recharts):**
For the main reward comparison chart (`web_ui/app/learning-playback/page.tsx`):
```tsx
const rewardData = [
{ reward: "R1 Hook", before: 0.42, after: 0.71, delta: "+29%" },
{ reward: "R2 Coherence", before: 0.59, after: 0.75, delta: "+16%" },
{ reward: "R3 Cultural", before: 0.61, after: 0.82, delta: "+21%" },
{ reward: "R4 Debate", before: 0.39, after: 0.80, delta: "+41%" },
{ reward: "R5 Preserve", before: 0.51, after: 0.76, delta: "+25%" },
{ reward: "R6 Safety", before: 0.50, after: 0.78, delta: "+28%" },
{ reward: "R7 Originality", before: 0.50, after: 0.79, delta: "+29%" },
{ reward: "R8 Persona", before: 0.45, after: 0.82, delta: "+37%" },
{ reward: "R9 Pacing", before: 0.52, after: 0.77, delta: "+25%" },
{ reward: "R10 Retention", before: 0.40, after: 0.86, delta: "+46%" },
];
// Then render with Recharts BarChart, showing both bars + delta label
```
For the retention curve:
```tsx
const retentionData = [
{ time: 0, before: 1.0, after: 1.0 },
{ time: 3, before: 0.72, after: 0.91 },
{ time: 6, before: 0.57, after: 0.82 },
{ time: 10, before: 0.45, after: 0.78 },
{ time: 15, before: 0.33, after: 0.72 },
{ time: 20, before: 0.28, after: 0.65 },
{ time: 25, before: 0.22, after: 0.58 },
{ time: 30, before: 0.18, after: 0.52 },
// ... up to 60s
];
```
**Verification checklist:**
After updating all files:
- βœ… RewardBars shows correct before/after values
- βœ… Learning curve shows baseline flat, trained improving
- βœ… Retention chart shows 3Γ— improvement (6s β†’ 20s drop-off shift)
- βœ… Dashboard displays "+27% total improvement"
- βœ… All delta percentages match the table above
- βœ… No hardcoded mock values remain (search for "0.50" or "mock")
**Test locally:**
```bash
npm run dev
# Visit http://localhost:3000
# Check that all metrics and charts show real data
```
Then commit and push to your repo.