Spaces:

Astro-Dude
/

citadel

Sleeping

App Files Files Community

Commit History

Update oversight reward curve PNG

f74b27a
verified

Astro-Dude commited on Apr 26

Update commander reward curve PNG

ed937c6
verified

Astro-Dude commited on Apr 26

Use relative paths for reward curve images

1aa5832

Astro-Dude commited on Apr 26

Sync gitignore and latest README/BLOG from main

114158c

Astro-Dude commited on Apr 26

Use GitHub raw URLs for reward curve images

166669c

Astro-Dude commited on Apr 26

Add blog post, update README with Colab link and training results

e9a2950

Astro-Dude commited on Apr 26

docs: update plan.md with training results, inference benchmarks, new files

033c53a

Astro-Dude commited on Apr 26

demo: exercise citation pathway + deterministic artifact + CI + walkthrough

b9182e8

Astro-Dude commited on Apr 26

Fix inference metadata field, add benchmark results, update README file layout links

614c4d0

Astro-Dude commited on Apr 26

Add HF Space link, results table, and reward curve to README

6c0424f

Astro-Dude commited on Apr 26

Add training results: reward curves, results doc, updated dashboard

bf77795

Astro-Dude Shaurya Verma commited on Apr 26

Fix oversight reward: expose oversight_reward on IncidentObservation

113917d

Astro-Dude Shaurya Verma commited on Apr 26

Save full training logs to JSON + fix oversight obs.metadata crash

1e29b6e

Astro-Dude Shaurya Verma commited on Apr 25

Reduce to 4 gens/200 tokens for ~60s/step on T4 (120 steps ≈ 2h)

97ce225

Astro-Dude commited on Apr 25

Add reward and done fields to IncidentObservation

72c691a

Astro-Dude Shaurya Verma commited on Apr 25

Fix env crash: remove obs.metadata assignment missing from IncidentObservation

3f061f7

Astro-Dude Shaurya Verma commited on Apr 25

Print env crash traceback in reward fn to diagnose -0.3 path

41c5282

Astro-Dude commited on Apr 25

Add reward diagnostic logging to identify why rewards are flat

48dd389

Astro-Dude Shaurya Verma commited on Apr 25

Tune GRPO for T4 speed: 4 gens, 256 tokens, temp 1.2

af9d280

Astro-Dude Shaurya Verma commited on Apr 25

Add training dashboard with per-model reward curves and metrics

d784bbf

Astro-Dude Shaurya Verma commited on Apr 25

Fix GRPO rollout collapse: temperature 0.7→1.0, generations 4→8, completion length 300→512

2e1ab85

Astro-Dude commited on Apr 25

Add dashboard.html and fix gitignore for runs/

098567b

Astro-Dude commited on Apr 25

Citadel v2.0.0 — Multi-Agent AI Defense Council

f0609d2

Astro-Dude commited on Apr 25