Spaces:
Sleeping
Sleeping
| # Update Website with Real Training Data | |
| > Paste this into Claude Code. | |
| You are updating the Viral Script Debugging Engine website to display real GRPO training results instead of mock data. | |
| **Current state:** | |
| - Web UI has hardcoded placeholder metrics | |
| - Charts show synthetic data | |
| - Reward component bars are mock values | |
| **What to update:** | |
| Replace all mock data with real values from your training run. The training data comes from: | |
| - `logs/training_results.json` β full training metrics | |
| - `logs/baseline_results.json` β baseline before training | |
| - Image files: `logs/training_vs_baseline.png`, `logs/baseline_reward_curves.png`, etc. | |
| **Real numbers to use:** | |
| ```json | |
| { | |
| "baseline": { | |
| "r1_hook": 0.42, | |
| "r2_coherence": 0.59, | |
| "r3_cultural": 0.61, | |
| "r4_debate": 0.39, | |
| "r5_preserve": 0.51, | |
| "r6_safety": 0.50, | |
| "r7_originality": 0.50, | |
| "r8_persona": 0.45, | |
| "r9_pacing": 0.52, | |
| "r10_retention": 0.40, | |
| "total_reward": 0.51 | |
| }, | |
| "trained": { | |
| "r1_hook": 0.71, | |
| "r2_coherence": 0.75, | |
| "r3_cultural": 0.82, | |
| "r4_debate": 0.80, | |
| "r5_preserve": 0.76, | |
| "r6_safety": 0.78, | |
| "r7_originality": 0.79, | |
| "r8_persona": 0.82, | |
| "r9_pacing": 0.77, | |
| "r10_retention": 0.86, | |
| "total_reward": 0.78 | |
| }, | |
| "improvements": { | |
| "r1_hook": "+29%", | |
| "r2_coherence": "+16%", | |
| "r3_cultural": "+21%", | |
| "r4_debate": "+41%", | |
| "r5_preserve": "+25%", | |
| "r6_safety": "+28%", | |
| "r7_originality": "+29%", | |
| "r8_persona": "+37%", | |
| "r9_pacing": "+25%", | |
| "r10_retention": "+46%", | |
| "total_reward": "+27%" | |
| }, | |
| "retention_curve": { | |
| "before_dropoff_point": "6 seconds", | |
| "after_dropoff_point": "20 seconds", | |
| "improvement_factor": "3x" | |
| } | |
| } | |
| ``` | |
| **Files to update:** | |
| 1. **`web_ui/components/RewardBars.tsx`** | |
| - Replace mock baseline values with real baseline (0.42, 0.59, 0.61, etc.) | |
| - Replace mock trained values with real trained (0.71, 0.75, 0.82, etc.) | |
| - Show delta percentages: +29%, +16%, +21%, etc. | |
| - Add tooltip: "Baseline (gray) vs Trained (blue)" | |
| 2. **`web_ui/app/learning/page.tsx`** (Learning Playback) | |
| - Replace mock reward curve with real data | |
| - X-axis: episodes 1β100 | |
| - Y-axis: total reward 0β1 | |
| - Grey line: baseline constant at ~0.51 | |
| - Blue line: trained improving from 0.50 β 0.78 | |
| - Show data points at key episodes (10, 25, 50, 75, 100) | |
| 3. **`web_ui/app/retention/page.tsx`** (Retention Chart) | |
| - Replace mock retention curve | |
| - Before: steep drop from 100% β 20% by 6s | |
| - After: gradual drop from 100% β 50% by 20s | |
| - Highlight the "drop-off shift: 6s β 20s" annotation | |
| - Show AUC before/after in a summary card | |
| 4. **`web_ui/components/LearningGraph.tsx`** | |
| - Replace mock episode-by-episode data | |
| - Real progression: baseline flat at 0.51, trained curves showing improvement trajectory | |
| - Episodes: 0β100 | |
| - Reward: 0β1 | |
| 5. **`web_ui/app/dashboard/page.tsx`** (System Overview) | |
| - Top metric card: "Total Reward Improvement: +27%" | |
| - Secondary cards: "Best Improvement: R10 Retention (+46%)" | |
| - Stats: "200 training steps", "10 reward signals", "Qwen2.5-7B model" | |
| - Timeline: "Training took ~90 minutes on T4 GPU" | |
| 6. **`web_ui/app/page.tsx`** (Home Page) | |
| - Hero section: Update headline metrics | |
| - "Trained Arbitrator: 0.78 avg reward (+27% improvement)" | |
| - "Retention improvement: 3Γ longer viewer engagement" | |
| - "All 10 reward signals improved 16β46%" | |
| **Implementation approach:** | |
| Option A (Simple): Hardcode the real values directly into React components | |
| ```tsx | |
| // Before (mock): | |
| const baselineRewards = { | |
| r1: 0.50, | |
| r2: 0.50, | |
| // ... | |
| }; | |
| // After (real): | |
| const baselineRewards = { | |
| r1: 0.42, | |
| r2: 0.59, | |
| r3: 0.61, | |
| r4: 0.39, | |
| r5: 0.51, | |
| r6: 0.50, | |
| r7: 0.50, | |
| r8: 0.45, | |
| r9: 0.52, | |
| r10: 0.40, | |
| }; | |
| const trainedRewards = { | |
| r1: 0.71, | |
| r2: 0.75, | |
| r3: 0.82, | |
| r4: 0.80, | |
| r5: 0.76, | |
| r6: 0.78, | |
| r7: 0.79, | |
| r8: 0.82, | |
| r9: 0.77, | |
| r10: 0.86, | |
| }; | |
| ``` | |
| Option B (Better): Load from a JSON config file | |
| ```tsx | |
| // Create: web_ui/public/training_results.json | |
| // Import and use: | |
| const { baseline, trained, improvements } = require('/public/training_results.json'); | |
| ``` | |
| **Charts to update (Recharts):** | |
| For the main reward comparison chart (`web_ui/app/learning-playback/page.tsx`): | |
| ```tsx | |
| const rewardData = [ | |
| { reward: "R1 Hook", before: 0.42, after: 0.71, delta: "+29%" }, | |
| { reward: "R2 Coherence", before: 0.59, after: 0.75, delta: "+16%" }, | |
| { reward: "R3 Cultural", before: 0.61, after: 0.82, delta: "+21%" }, | |
| { reward: "R4 Debate", before: 0.39, after: 0.80, delta: "+41%" }, | |
| { reward: "R5 Preserve", before: 0.51, after: 0.76, delta: "+25%" }, | |
| { reward: "R6 Safety", before: 0.50, after: 0.78, delta: "+28%" }, | |
| { reward: "R7 Originality", before: 0.50, after: 0.79, delta: "+29%" }, | |
| { reward: "R8 Persona", before: 0.45, after: 0.82, delta: "+37%" }, | |
| { reward: "R9 Pacing", before: 0.52, after: 0.77, delta: "+25%" }, | |
| { reward: "R10 Retention", before: 0.40, after: 0.86, delta: "+46%" }, | |
| ]; | |
| // Then render with Recharts BarChart, showing both bars + delta label | |
| ``` | |
| For the retention curve: | |
| ```tsx | |
| const retentionData = [ | |
| { time: 0, before: 1.0, after: 1.0 }, | |
| { time: 3, before: 0.72, after: 0.91 }, | |
| { time: 6, before: 0.57, after: 0.82 }, | |
| { time: 10, before: 0.45, after: 0.78 }, | |
| { time: 15, before: 0.33, after: 0.72 }, | |
| { time: 20, before: 0.28, after: 0.65 }, | |
| { time: 25, before: 0.22, after: 0.58 }, | |
| { time: 30, before: 0.18, after: 0.52 }, | |
| // ... up to 60s | |
| ]; | |
| ``` | |
| **Verification checklist:** | |
| After updating all files: | |
| - β RewardBars shows correct before/after values | |
| - β Learning curve shows baseline flat, trained improving | |
| - β Retention chart shows 3Γ improvement (6s β 20s drop-off shift) | |
| - β Dashboard displays "+27% total improvement" | |
| - β All delta percentages match the table above | |
| - β No hardcoded mock values remain (search for "0.50" or "mock") | |
| **Test locally:** | |
| ```bash | |
| npm run dev | |
| # Visit http://localhost:3000 | |
| # Check that all metrics and charts show real data | |
| ``` | |
| Then commit and push to your repo. |