Add blog post, update README with Colab link and training results e9a2950 Astro-Dude commited on Apr 26
docs: update plan.md with training results, inference benchmarks, new files 033c53a Astro-Dude commited on Apr 26
demo: exercise citation pathway + deterministic artifact + CI + walkthrough b9182e8 Astro-Dude commited on Apr 26
Fix inference metadata field, add benchmark results, update README file layout links 614c4d0 Astro-Dude commited on Apr 26
Add training results: reward curves, results doc, updated dashboard bf77795 Astro-Dude Shaurya Verma commited on Apr 26
Fix oversight reward: expose oversight_reward on IncidentObservation 113917d Astro-Dude Shaurya Verma commited on Apr 26
Save full training logs to JSON + fix oversight obs.metadata crash 1e29b6e Astro-Dude Shaurya Verma commited on Apr 25
Reduce to 4 gens/200 tokens for ~60s/step on T4 (120 steps β 2h) 97ce225 Astro-Dude commited on Apr 25
Add reward and done fields to IncidentObservation 72c691a Astro-Dude Shaurya Verma commited on Apr 25
Fix env crash: remove obs.metadata assignment missing from IncidentObservation 3f061f7 Astro-Dude Shaurya Verma commited on Apr 25
Add reward diagnostic logging to identify why rewards are flat 48dd389 Astro-Dude Shaurya Verma commited on Apr 25
Tune GRPO for T4 speed: 4 gens, 256 tokens, temp 1.2 af9d280 Astro-Dude Shaurya Verma commited on Apr 25
Add training dashboard with per-model reward curves and metrics d784bbf Astro-Dude Shaurya Verma commited on Apr 25
Fix GRPO rollout collapse: temperature 0.7β1.0, generations 4β8, completion length 300β512 2e1ab85 Astro-Dude commited on Apr 25