Commit History

ui: chatbot dark-neon theme β€” readable bubbles on Watch Agent Play
d90960c
Running

Anurag Agarwal Cursor commited on

fix(ui): drop unsupported type= kwarg; dict format works by default
a435a88

Anurag Agarwal Cursor commited on

fix(ui): Gradio chatbot format error on Watch Agent Play
d63a291

Anurag Agarwal Cursor commited on

UI: fix '0 runs' chip + instant tab-card flip via js= callback
e0be40d

Anurag Agarwal commited on

UI: replace tab strip with big-card navigation
3782303

Anurag Agarwal commited on

Semantic run names: Probe/Drift/Anchor/Restrain/Champion + regen all plots
84fbeda

Anurag Agarwal commited on

Reframe: drop Run 7 from hero, keep only where appropriate
023e210

Anurag Agarwal commited on

Polish Blog.md: fix broken images + visual hierarchy
fe94ae7

Anurag Agarwal commited on

Stronger opening: meeting scheduling with 3 fabricated fields
5df7029

Anurag Agarwal commited on

Hero Zone redesign + fix Run 7 missing from plot 08
535b7f9

Anurag Agarwal commited on

Expand Blog.md to comprehensive deep-dive (5.9k words, all evidence)
7fbfbc0

Anurag Agarwal commited on

Reframe: environment is the contribution, training is validation
ea8263a

Anurag Agarwal commited on

Add full plot deck to Blog and README
f436dde

Anurag Agarwal commited on

Submission polish: Run 7 headline across all surfaces
8cf6fc1

Anurag Agarwal commited on

Run 7 BEATS BASE (+19%) + UI fixes
50d2fcb

Anurag Agarwal commited on

Blog: human hook opening, simple language
ca10a3a

Anurag Agarwal commited on

Judge-ready polish: diagram, before/after, curated replays
af3c208

Anurag Agarwal commited on

Fix neon CSS injection for Gradio 6.x + Blog.md at root
8fb3486

Anurag Agarwal commited on

Update blog + slides with Run 5-7 findings
3a0836e

Anurag Agarwal commited on

Port 7860 + neon theme
657f0fa

Anurag Agarwal commited on

Neon cyberpunk theme - dark bg, glowing accents, stat cards
b4f213a

Anurag Agarwal commited on

Fix Gradio compatibility for openenv's bundled version
712275f

Anurag Agarwal commited on

Add rich 4-tab Gradio UI dashboard
e65564f

Anurag Agarwal commited on

Run 6 results + training fixes + all plots regenerated
aae07d0

Anurag Agarwal commited on

Align eval prompts with training: add required_keys to initial context
5c18f41

Anurag Agarwal commited on

chore: track remaining PNGs via LFS/xet
7cd0fee

Anurag Agarwal commited on

plots: add training progression + diagnostics, drop W&B links
099bec8
verified

agarwalanu3103 commited on

docs: README aligned with hackathon judging criteria (Judges 60s tour + storytelling arc + plot captions)
310de9a
verified

agarwalanu3103 commited on

docs: sync README.md (slide deck + auto-validator gate update)
b6de3a6
verified

agarwalanu3103 commited on

docs: sync SUBMISSION_CHECKLIST.md (slide deck + auto-validator gate update)
753d688
verified

agarwalanu3103 commited on

docs: sync docs/slides.md (slide deck + auto-validator gate update)
5e0e1b0
verified

agarwalanu3103 commited on

docs: add detailed model cards for Run 1 / Run 2 / Run 4
f1678ab
verified

agarwalanu3103 commited on

docs: sync docs/trace_demo_run1.md
406b310
verified

agarwalanu3103 commited on

docs: sync docs/trace_demo.md
6dbd9a2
verified

agarwalanu3103 commited on

docs: sync docs/STATUS.md
f35202f
verified

agarwalanu3103 commited on

docs: sync docs/blog.md
82693a1
verified

agarwalanu3103 commited on

Add plots/ for inline embedding in env Space README
ac86191
verified

agarwalanu3103 commited on

Sync canonical README (KL-anchor narrative, embedded plots, deliverable links)
7fe6783
verified

agarwalanu3103 commited on

env: bump max_concurrent_envs from 8 to 64
8a9cc57
verified

agarwalanu3103 commited on

eval: enforce one-tool-call response format on every turn
a22fcfd
verified

agarwalanu3103 commited on

Fix parser: handle quoted commas, balanced parens, ASK:/PROPOSE: prefixes
7c0cc92
verified

agarwalanu3103 commited on

Eval system prompt: align character-for-character with training PROMPT β€” ensures trained model has zero distribution shift between train and eval
d9beb62
verified

agarwalanu3103 commited on

Eval system prompt: drop misleading software-stack example, align with training PROMPT (forces model to use task-family fields, not copy the example verbatim)
ef5498c
verified

agarwalanu3103 commited on

Parser: support ASK:/PROPOSE:/Q:/PLAN: prefix forms produced by Qwen3 GRPO
b8a5922
verified

agarwalanu3103 commited on

inference: parser fix β€” handle key=value in func calls + balanced parens
f251890
verified

agarwalanu3103 commited on

fix(eval): pass enable_thinking=False to disable Qwen3 thinking + bump MAX_TOKENS to 800
e4d1233
verified

agarwalanu3103 commited on

feat: add run_eval.py to Space (needed by eval_with_vllm.py for trained-model evals)
6473a24
verified

agarwalanu3103 commited on

env: enable concurrent rollout sessions (max_concurrent_envs=8)
895f00d

Anurag Agarwal commited on

rewrite training notebook with cleaner cell-by-cell structure
a45e7e7

Anurag Agarwal commited on

Add training/train_grpo.ipynb β€” GRPO training notebook (TRL + vLLM + ClarifyEnv)
5e8f794

Anurag Agarwal commited on