Spaces:

v4xsh
/

nervousystem-env

Sleeping

App Files Files Community

nervousystem-env

Commit History

docs: publish real SFT log_history and add provenance note

93f985b

vx7sh commited on Apr 26

docs: publish real SFT log_history and add provenance note

1a3f787

vx7sh commited on Apr 26

docs: add final blog post for submission

f7e8508

vx7sh commited on Apr 26

docs: add notebook and concise proof screenshots

4828f2f

vx7sh commited on Apr 26

docs: add training reproduction notebook link

aaf2446

vx7sh commited on Apr 26

docs: add terminal proof screenshots

1be44ff
verified

v4xsh commited on Apr 26

docs: add final training evidence plots

f12d4f1

vx7sh commited on Apr 26

docs: finalize phase-aware training evidence

d36ec16

vx7sh commited on Apr 26

fix(eval): add phase-aware hard and cascade action selection

1e79bac

vx7sh commited on Apr 26

fix(training): allow grpo script to import app config

c83ba8e

vx7sh commited on Apr 26

fix(grpo): keep reward rollouts aligned with prompt seeds

977c01a

vx7sh commited on Apr 26

fix(grpo): align hard and cascade training with final eval

96b891c

vx7sh commited on Apr 26

fix(grpo): gate hard-task rollouts on correct first action

72d562f

vx7sh commited on Apr 25

fix(grpo): salvage malformed action prefixes for reward scoring

41f1550

vx7sh commited on Apr 25

fix(grpo): prefix-anchored generation + continuous reward ladder

3a9c2b7

vx7sh commited on Apr 25

fix(grpo): pre-apply chat template and tighten generation params

10ca05a

vx7sh commited on Apr 25

fix(docker): use python:3.13-slim to fix build (audioop-lts requires >=3.13)

3e6e1ca

vx7sh commited on Apr 25

fix(rl): cycle patch files in hard/cascade rollouts and trim default RL pool

ec08742

vx7sh commited on Apr 25

docs(evidence): add multi-agent eval, long-horizon trace, and constrained eval logs

9731ebe

vx7sh commited on Apr 25

feat(rl): GRPO continuation from SFT adapter with multi-step task rollouts

678d74f

vx7sh commited on Apr 25

feat(curriculum): adaptive difficulty for telemetry, masking, and secondary failures

3928ed0

vx7sh commited on Apr 25

test(reward): make reward audit run in-process when server is offline

26ea725

vx7sh commited on Apr 25

fix(reward): preserve task pass thresholds and switch MER to efficiency multiplier

00c2406

vx7sh commited on Apr 25

feat(env): add fleet_coordination multi-agent task

39931d5

vx7sh commited on Apr 25

Harden submission evidence and reward integrity

3360325

vx7sh commited on Apr 25

Document training evidence and improve model evaluation

7fb0542

vx7sh commited on Apr 25

Add training evidence summary

186b8b1

vx7sh commited on Apr 25

Document training results and artifacts

d9cb5e7

vx7sh commited on Apr 25

Use constrained action scoring for model evaluation

f93b5c1

vx7sh commited on Apr 25

Use multi-step SFT pipeline for reliable training evidence

775670f

vx7sh commited on Apr 25

Add SFT warmup and model evaluation pipeline

73f137c

vx7sh commited on Apr 25

Improve GRPO rewards and add model evaluation

37b9396

vx7sh commited on Apr 25

Handle chat completions in GRPO reward parser

ade5d78

vx7sh commited on Apr 25

Use vanilla TRL GRPO training stack

e530995

vx7sh commited on Apr 25

Patch Unsloth GRPO text-only trainer attributes

0cebfea

vx7sh commited on Apr 25

Fix Hugging Face Space and GRPO training config

53d7ae2

vx7sh commited on Apr 25

Fix Hugging Face Space Docker config

b5314ec

vx7sh commited on Apr 25

Add Hugging Face Spaces config

a3ef87a

vx7sh commited on Apr 25

chore(grading): raise medium pass threshold to 0.7 and document audit notes

76e100b

vx7sh commited on Apr 25

feat(eval): add seed variance reporting script

95b7a07

vx7sh commited on Apr 25

feat(eval): add benchmark harness and reward integrity artifacts

ee2f27b

vx7sh commited on Apr 25

test: add coalition and anti-gaming grader coverage

7d2be3f

vx7sh commited on Apr 25

feat(env): add curriculum, challenge generation, coalition, and black-swan mechanics

edc6488

vx7sh commited on Apr 25

chore(submission): add baseline snapshot and validation script

d3e5b9b

vx7sh commited on Apr 24

chore(deps): pin full runtime requirements

f54a1f4

vx7sh commited on Apr 24

feat(env): add robustness, uncertainty consensus, and partial observability

487d9c3

vx7sh commited on Apr 24

Ship Round 2 manifest/docs, dashboard, and GRPO training pipeline

ff665de

vx7sh commited on Apr 22

Add Fleet supervisor-worker delegation layer with /delegate API and tests

0f99e53

vx7sh commited on Apr 21

Add cascade grader and regression tests for cascade task

ce9bc2c

vx7sh commited on Apr 21

Track LLM token usage and send token_count in actions

319df08

vx7sh commited on Apr 21

Commit History

docs: publish real SFT log_history and add provenance note 93f985b

docs: publish real SFT log_history and add provenance note 1a3f787

docs: add final blog post for submission f7e8508

docs: add notebook and concise proof screenshots 4828f2f

docs: add training reproduction notebook link aaf2446

docs: add terminal proof screenshots 1be44ff verified

docs: add final training evidence plots f12d4f1

docs: finalize phase-aware training evidence d36ec16

fix(eval): add phase-aware hard and cascade action selection 1e79bac

fix(training): allow grpo script to import app config c83ba8e

fix(grpo): keep reward rollouts aligned with prompt seeds 977c01a

fix(grpo): align hard and cascade training with final eval 96b891c

fix(grpo): gate hard-task rollouts on correct first action 72d562f

fix(grpo): salvage malformed action prefixes for reward scoring 41f1550

fix(grpo): prefix-anchored generation + continuous reward ladder 3a9c2b7

fix(grpo): pre-apply chat template and tighten generation params 10ca05a

fix(docker): use python:3.13-slim to fix build (audioop-lts requires >=3.13) 3e6e1ca

fix(rl): cycle patch files in hard/cascade rollouts and trim default RL pool ec08742

docs(evidence): add multi-agent eval, long-horizon trace, and constrained eval logs 9731ebe

feat(rl): GRPO continuation from SFT adapter with multi-step task rollouts 678d74f

feat(curriculum): adaptive difficulty for telemetry, masking, and secondary failures 3928ed0

test(reward): make reward audit run in-process when server is offline 26ea725

fix(reward): preserve task pass thresholds and switch MER to efficiency multiplier 00c2406

feat(env): add fleet_coordination multi-agent task 39931d5

Harden submission evidence and reward integrity 3360325

Document training evidence and improve model evaluation 7fb0542

Add training evidence summary 186b8b1

Document training results and artifacts d9cb5e7

Use constrained action scoring for model evaluation f93b5c1

Use multi-step SFT pipeline for reliable training evidence 775670f

Add SFT warmup and model evaluation pipeline 73f137c

Improve GRPO rewards and add model evaluation 37b9396

Handle chat completions in GRPO reward parser ade5d78

Use vanilla TRL GRPO training stack e530995

Patch Unsloth GRPO text-only trainer attributes 0cebfea

Fix Hugging Face Space and GRPO training config 53d7ae2

Fix Hugging Face Space Docker config b5314ec

Add Hugging Face Spaces config a3ef87a

chore(grading): raise medium pass threshold to 0.7 and document audit notes 76e100b

feat(eval): add seed variance reporting script 95b7a07

feat(eval): add benchmark harness and reward integrity artifacts ee2f27b

test: add coalition and anti-gaming grader coverage 7d2be3f

feat(env): add curriculum, challenge generation, coalition, and black-swan mechanics edc6488

chore(submission): add baseline snapshot and validation script d3e5b9b

chore(deps): pin full runtime requirements f54a1f4

feat(env): add robustness, uncertainty consensus, and partial observability 487d9c3

Ship Round 2 manifest/docs, dashboard, and GRPO training pipeline ff665de

Add Fleet supervisor-worker delegation layer with /delegate API and tests 0f99e53

Add cascade grader and regression tests for cascade task ce9bc2c

Track LLM token usage and send token_count in actions 319df08

docs: publish real SFT log_history and add provenance note

93f985b

docs: publish real SFT log_history and add provenance note

1a3f787

docs: add final blog post for submission

f7e8508

docs: add notebook and concise proof screenshots

4828f2f

docs: add training reproduction notebook link

aaf2446

docs: add terminal proof screenshots

1be44ff
verified

docs: add final training evidence plots

f12d4f1

docs: finalize phase-aware training evidence

d36ec16

fix(eval): add phase-aware hard and cascade action selection

1e79bac

fix(training): allow grpo script to import app config

c83ba8e

fix(grpo): keep reward rollouts aligned with prompt seeds

977c01a

fix(grpo): align hard and cascade training with final eval

96b891c

fix(grpo): gate hard-task rollouts on correct first action

72d562f

fix(grpo): salvage malformed action prefixes for reward scoring

41f1550

fix(grpo): prefix-anchored generation + continuous reward ladder

3a9c2b7

fix(grpo): pre-apply chat template and tighten generation params

10ca05a

fix(docker): use python:3.13-slim to fix build (audioop-lts requires >=3.13)

3e6e1ca

fix(rl): cycle patch files in hard/cascade rollouts and trim default RL pool

ec08742

docs(evidence): add multi-agent eval, long-horizon trace, and constrained eval logs

9731ebe

feat(rl): GRPO continuation from SFT adapter with multi-step task rollouts

678d74f

feat(curriculum): adaptive difficulty for telemetry, masking, and secondary failures

3928ed0

test(reward): make reward audit run in-process when server is offline

26ea725

fix(reward): preserve task pass thresholds and switch MER to efficiency multiplier

00c2406

feat(env): add fleet_coordination multi-agent task

39931d5

Harden submission evidence and reward integrity

3360325

Document training evidence and improve model evaluation

7fb0542

Add training evidence summary

186b8b1

Document training results and artifacts

d9cb5e7

Use constrained action scoring for model evaluation

f93b5c1

Use multi-step SFT pipeline for reliable training evidence

775670f

Add SFT warmup and model evaluation pipeline

73f137c

Improve GRPO rewards and add model evaluation

37b9396

Handle chat completions in GRPO reward parser

ade5d78

Use vanilla TRL GRPO training stack

e530995

Patch Unsloth GRPO text-only trainer attributes

0cebfea

Fix Hugging Face Space and GRPO training config

53d7ae2

Fix Hugging Face Space Docker config

b5314ec

Add Hugging Face Spaces config

a3ef87a

chore(grading): raise medium pass threshold to 0.7 and document audit notes

76e100b

feat(eval): add seed variance reporting script

95b7a07

feat(eval): add benchmark harness and reward integrity artifacts

ee2f27b

test: add coalition and anti-gaming grader coverage

7d2be3f

feat(env): add curriculum, challenge generation, coalition, and black-swan mechanics

edc6488

chore(submission): add baseline snapshot and validation script

d3e5b9b

chore(deps): pin full runtime requirements

f54a1f4

feat(env): add robustness, uncertainty consensus, and partial observability

487d9c3

Ship Round 2 manifest/docs, dashboard, and GRPO training pipeline

ff665de

Add Fleet supervisor-worker delegation layer with /delegate API and tests

0f99e53

Add cascade grader and regression tests for cascade task

ce9bc2c

Track LLM token usage and send token_count in actions

319df08