Spaces:

agarwalanu3103
/

clarify-rl

Running

App Files Files Community

clarify-rl

Commit History

ui: chatbot dark-neon theme — readable bubbles on Watch Agent Play

d90960c

Running

Anurag Agarwal Cursor commited on May 8

fix(ui): drop unsupported type= kwarg; dict format works by default

a435a88

Anurag Agarwal Cursor commited on May 8

fix(ui): Gradio chatbot format error on Watch Agent Play

d63a291

Anurag Agarwal Cursor commited on May 8

UI: fix '0 runs' chip + instant tab-card flip via js= callback

e0be40d

Anurag Agarwal commited on Apr 26

UI: replace tab strip with big-card navigation

3782303

Anurag Agarwal commited on Apr 26

Semantic run names: Probe/Drift/Anchor/Restrain/Champion + regen all plots

84fbeda

Anurag Agarwal commited on Apr 26

Reframe: drop Run 7 from hero, keep only where appropriate

023e210

Anurag Agarwal commited on Apr 26

Polish Blog.md: fix broken images + visual hierarchy

fe94ae7

Anurag Agarwal commited on Apr 26

Stronger opening: meeting scheduling with 3 fabricated fields

5df7029

Anurag Agarwal commited on Apr 26

Hero Zone redesign + fix Run 7 missing from plot 08

535b7f9

Anurag Agarwal commited on Apr 26

Expand Blog.md to comprehensive deep-dive (5.9k words, all evidence)

7fbfbc0

Anurag Agarwal commited on Apr 26

Reframe: environment is the contribution, training is validation

ea8263a

Anurag Agarwal commited on Apr 26

Add full plot deck to Blog and README

f436dde

Anurag Agarwal commited on Apr 26

Submission polish: Run 7 headline across all surfaces

8cf6fc1

Anurag Agarwal commited on Apr 26

Run 7 BEATS BASE (+19%) + UI fixes

50d2fcb

Anurag Agarwal commited on Apr 26

Blog: human hook opening, simple language

ca10a3a

Anurag Agarwal commited on Apr 26

Judge-ready polish: diagram, before/after, curated replays

af3c208

Anurag Agarwal commited on Apr 26

Fix neon CSS injection for Gradio 6.x + Blog.md at root

8fb3486

Anurag Agarwal commited on Apr 26

Update blog + slides with Run 5-7 findings

3a0836e

Anurag Agarwal commited on Apr 26

Port 7860 + neon theme

657f0fa

Anurag Agarwal commited on Apr 26

Neon cyberpunk theme - dark bg, glowing accents, stat cards

b4f213a

Anurag Agarwal commited on Apr 26

Fix Gradio compatibility for openenv's bundled version

712275f

Anurag Agarwal commited on Apr 26

Add rich 4-tab Gradio UI dashboard

e65564f

Anurag Agarwal commited on Apr 26

Run 6 results + training fixes + all plots regenerated

aae07d0

Anurag Agarwal commited on Apr 26

Align eval prompts with training: add required_keys to initial context

5c18f41

Anurag Agarwal commited on Apr 26

chore: track remaining PNGs via LFS/xet

7cd0fee

Anurag Agarwal commited on Apr 26

plots: add training progression + diagnostics, drop W&B links

099bec8
verified

agarwalanu3103 commited on Apr 26

docs: README aligned with hackathon judging criteria (Judges 60s tour + storytelling arc + plot captions)

310de9a
verified

agarwalanu3103 commited on Apr 25

docs: sync README.md (slide deck + auto-validator gate update)

b6de3a6
verified

agarwalanu3103 commited on Apr 25

docs: sync SUBMISSION_CHECKLIST.md (slide deck + auto-validator gate update)

753d688
verified

agarwalanu3103 commited on Apr 25

docs: sync docs/slides.md (slide deck + auto-validator gate update)

5e0e1b0
verified

agarwalanu3103 commited on Apr 25

docs: add detailed model cards for Run 1 / Run 2 / Run 4

f1678ab
verified

agarwalanu3103 commited on Apr 25

docs: sync docs/trace_demo_run1.md

406b310
verified

agarwalanu3103 commited on Apr 25

docs: sync docs/trace_demo.md

6dbd9a2
verified

agarwalanu3103 commited on Apr 25

docs: sync docs/STATUS.md

f35202f
verified

agarwalanu3103 commited on Apr 25

docs: sync docs/blog.md

82693a1
verified

agarwalanu3103 commited on Apr 25

Add plots/ for inline embedding in env Space README

ac86191
verified

agarwalanu3103 commited on Apr 25

Sync canonical README (KL-anchor narrative, embedded plots, deliverable links)

7fe6783
verified

agarwalanu3103 commited on Apr 25

env: bump max_concurrent_envs from 8 to 64

8a9cc57
verified

agarwalanu3103 commited on Apr 25

eval: enforce one-tool-call response format on every turn

a22fcfd
verified

agarwalanu3103 commited on Apr 25

Fix parser: handle quoted commas, balanced parens, ASK:/PROPOSE: prefixes

7c0cc92
verified

agarwalanu3103 commited on Apr 25

Eval system prompt: align character-for-character with training PROMPT — ensures trained model has zero distribution shift between train and eval

d9beb62
verified

agarwalanu3103 commited on Apr 25

Eval system prompt: drop misleading software-stack example, align with training PROMPT (forces model to use task-family fields, not copy the example verbatim)

ef5498c
verified

agarwalanu3103 commited on Apr 25

Parser: support ASK:/PROPOSE:/Q:/PLAN: prefix forms produced by Qwen3 GRPO

b8a5922
verified

agarwalanu3103 commited on Apr 25

inference: parser fix — handle key=value in func calls + balanced parens

f251890
verified

agarwalanu3103 commited on Apr 25

fix(eval): pass enable_thinking=False to disable Qwen3 thinking + bump MAX_TOKENS to 800

e4d1233
verified

agarwalanu3103 commited on Apr 25

feat: add run_eval.py to Space (needed by eval_with_vllm.py for trained-model evals)

6473a24
verified

agarwalanu3103 commited on Apr 25

env: enable concurrent rollout sessions (max_concurrent_envs=8)

895f00d

Anurag Agarwal commited on Apr 25

rewrite training notebook with cleaner cell-by-cell structure

a45e7e7

Anurag Agarwal commited on Apr 25

Add training/train_grpo.ipynb — GRPO training notebook (TRL + vLLM + ClarifyEnv)

5e8f794

Anurag Agarwal commited on Apr 25

Commit History

ui: chatbot dark-neon theme — readable bubbles on Watch Agent Play d90960c Running

fix(ui): drop unsupported type= kwarg; dict format works by default a435a88

fix(ui): Gradio chatbot format error on Watch Agent Play d63a291

UI: fix '0 runs' chip + instant tab-card flip via js= callback e0be40d

UI: replace tab strip with big-card navigation 3782303

Semantic run names: Probe/Drift/Anchor/Restrain/Champion + regen all plots 84fbeda

Reframe: drop Run 7 from hero, keep only where appropriate 023e210

Polish Blog.md: fix broken images + visual hierarchy fe94ae7

Stronger opening: meeting scheduling with 3 fabricated fields 5df7029

Hero Zone redesign + fix Run 7 missing from plot 08 535b7f9

Expand Blog.md to comprehensive deep-dive (5.9k words, all evidence) 7fbfbc0

Reframe: environment is the contribution, training is validation ea8263a

Add full plot deck to Blog and README f436dde

Submission polish: Run 7 headline across all surfaces 8cf6fc1

Run 7 BEATS BASE (+19%) + UI fixes 50d2fcb

Blog: human hook opening, simple language ca10a3a

Judge-ready polish: diagram, before/after, curated replays af3c208

Fix neon CSS injection for Gradio 6.x + Blog.md at root 8fb3486

Update blog + slides with Run 5-7 findings 3a0836e

Port 7860 + neon theme 657f0fa

Neon cyberpunk theme - dark bg, glowing accents, stat cards b4f213a

Fix Gradio compatibility for openenv's bundled version 712275f

Add rich 4-tab Gradio UI dashboard e65564f

Run 6 results + training fixes + all plots regenerated aae07d0

Align eval prompts with training: add required_keys to initial context 5c18f41

chore: track remaining PNGs via LFS/xet 7cd0fee

plots: add training progression + diagnostics, drop W&B links 099bec8 verified

docs: README aligned with hackathon judging criteria (Judges 60s tour + storytelling arc + plot captions) 310de9a verified

docs: sync README.md (slide deck + auto-validator gate update) b6de3a6 verified

docs: sync SUBMISSION_CHECKLIST.md (slide deck + auto-validator gate update) 753d688 verified

docs: sync docs/slides.md (slide deck + auto-validator gate update) 5e0e1b0 verified

docs: add detailed model cards for Run 1 / Run 2 / Run 4 f1678ab verified

docs: sync docs/trace_demo_run1.md 406b310 verified

docs: sync docs/trace_demo.md 6dbd9a2 verified

docs: sync docs/STATUS.md f35202f verified

docs: sync docs/blog.md 82693a1 verified

Add plots/ for inline embedding in env Space README ac86191 verified

Sync canonical README (KL-anchor narrative, embedded plots, deliverable links) 7fe6783 verified

env: bump max_concurrent_envs from 8 to 64 8a9cc57 verified

eval: enforce one-tool-call response format on every turn a22fcfd verified

Fix parser: handle quoted commas, balanced parens, ASK:/PROPOSE: prefixes 7c0cc92 verified

Eval system prompt: align character-for-character with training PROMPT — ensures trained model has zero distribution shift between train and eval d9beb62 verified

Eval system prompt: drop misleading software-stack example, align with training PROMPT (forces model to use task-family fields, not copy the example verbatim) ef5498c verified

Parser: support ASK:/PROPOSE:/Q:/PLAN: prefix forms produced by Qwen3 GRPO b8a5922 verified

inference: parser fix — handle key=value in func calls + balanced parens f251890 verified

fix(eval): pass enable_thinking=False to disable Qwen3 thinking + bump MAX_TOKENS to 800 e4d1233 verified

feat: add run_eval.py to Space (needed by eval_with_vllm.py for trained-model evals) 6473a24 verified

env: enable concurrent rollout sessions (max_concurrent_envs=8) 895f00d

rewrite training notebook with cleaner cell-by-cell structure a45e7e7

Add training/train_grpo.ipynb — GRPO training notebook (TRL + vLLM + ClarifyEnv) 5e8f794

ui: chatbot dark-neon theme — readable bubbles on Watch Agent Play

d90960c

Running

fix(ui): drop unsupported type= kwarg; dict format works by default

a435a88

fix(ui): Gradio chatbot format error on Watch Agent Play

d63a291

UI: fix '0 runs' chip + instant tab-card flip via js= callback

e0be40d

UI: replace tab strip with big-card navigation

3782303

Semantic run names: Probe/Drift/Anchor/Restrain/Champion + regen all plots

84fbeda

Reframe: drop Run 7 from hero, keep only where appropriate

023e210

Polish Blog.md: fix broken images + visual hierarchy

fe94ae7

Stronger opening: meeting scheduling with 3 fabricated fields

5df7029

Hero Zone redesign + fix Run 7 missing from plot 08

535b7f9

Expand Blog.md to comprehensive deep-dive (5.9k words, all evidence)

7fbfbc0

Reframe: environment is the contribution, training is validation

ea8263a

Add full plot deck to Blog and README

f436dde

Submission polish: Run 7 headline across all surfaces

8cf6fc1

Run 7 BEATS BASE (+19%) + UI fixes

50d2fcb

Blog: human hook opening, simple language

ca10a3a

Judge-ready polish: diagram, before/after, curated replays

af3c208

Fix neon CSS injection for Gradio 6.x + Blog.md at root

8fb3486

Update blog + slides with Run 5-7 findings

3a0836e

Port 7860 + neon theme

657f0fa

Neon cyberpunk theme - dark bg, glowing accents, stat cards

b4f213a

Fix Gradio compatibility for openenv's bundled version

712275f

Add rich 4-tab Gradio UI dashboard

e65564f

Run 6 results + training fixes + all plots regenerated

aae07d0

Align eval prompts with training: add required_keys to initial context

5c18f41

chore: track remaining PNGs via LFS/xet

7cd0fee

plots: add training progression + diagnostics, drop W&B links

099bec8
verified

docs: README aligned with hackathon judging criteria (Judges 60s tour + storytelling arc + plot captions)

310de9a
verified

docs: sync README.md (slide deck + auto-validator gate update)

b6de3a6
verified

docs: sync SUBMISSION_CHECKLIST.md (slide deck + auto-validator gate update)

753d688
verified

docs: sync docs/slides.md (slide deck + auto-validator gate update)

5e0e1b0
verified

docs: add detailed model cards for Run 1 / Run 2 / Run 4

f1678ab
verified

docs: sync docs/trace_demo_run1.md

406b310
verified

docs: sync docs/trace_demo.md

6dbd9a2
verified

docs: sync docs/STATUS.md

f35202f
verified

docs: sync docs/blog.md

82693a1
verified

Add plots/ for inline embedding in env Space README

ac86191
verified

Sync canonical README (KL-anchor narrative, embedded plots, deliverable links)

7fe6783
verified

env: bump max_concurrent_envs from 8 to 64

8a9cc57
verified

eval: enforce one-tool-call response format on every turn

a22fcfd
verified

Fix parser: handle quoted commas, balanced parens, ASK:/PROPOSE: prefixes

7c0cc92
verified

Eval system prompt: align character-for-character with training PROMPT — ensures trained model has zero distribution shift between train and eval

d9beb62
verified

Eval system prompt: drop misleading software-stack example, align with training PROMPT (forces model to use task-family fields, not copy the example verbatim)

ef5498c
verified

Parser: support ASK:/PROPOSE:/Q:/PLAN: prefix forms produced by Qwen3 GRPO

b8a5922
verified

inference: parser fix — handle key=value in func calls + balanced parens

f251890
verified

fix(eval): pass enable_thinking=False to disable Qwen3 thinking + bump MAX_TOKENS to 800

e4d1233
verified

feat: add run_eval.py to Space (needed by eval_with_vllm.py for trained-model evals)

6473a24
verified

env: enable concurrent rollout sessions (max_concurrent_envs=8)

895f00d

rewrite training notebook with cleaner cell-by-cell structure

a45e7e7

Add training/train_grpo.ipynb — GRPO training notebook (TRL + vLLM + ClarifyEnv)

5e8f794