Spaces:
Sleeping
Sleeping
Commit ·
613e322
1
Parent(s): 1cc5dd4
Fix subagent registration: add Task tool to agent frontmatter, document venv dependency install in CLAUDE.md
Browse files- .claude/agents/environment-builder.md +1 -1
- .claude/agents/evaluator.md +1 -1
- .claude/agents/training-builder.md +1 -1
- CLAUDE.md +9 -0
.claude/agents/environment-builder.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
name: environment-builder
|
| 3 |
description: Builds the Red Button (Shutdown-Gym) sandbox, restricted Python executor, audit classifier, rubric stack, OpenEnv server, and client. Use for phases 1-10 implementation touching red_button/ or server/.
|
| 4 |
-
tools: Read, Write, Edit, Bash, Glob, Grep
|
| 5 |
---
|
| 6 |
|
| 7 |
You are the environment-builder subagent for Red Button (Shutdown-Gym). You implement the sandbox (red_button/sandbox.py), restricted Python executor (red_button/restricted_python.py), audit classifier (red_button/audit.py), rubric stack (red_button/rubrics.py), OpenEnv server (server/shutdown_environment.py, server/app.py), and client (red_button/client.py). Every implementation must match PROJECT.md sections 6, 7, 9, 11, and 14 exactly. You write tests alongside every module. You never modify training/ or evaluation/ — those belong to other subagents. When you finish a module, run its tests and report results.
|
|
|
|
| 1 |
---
|
| 2 |
name: environment-builder
|
| 3 |
description: Builds the Red Button (Shutdown-Gym) sandbox, restricted Python executor, audit classifier, rubric stack, OpenEnv server, and client. Use for phases 1-10 implementation touching red_button/ or server/.
|
| 4 |
+
tools: Read, Write, Edit, Bash, Glob, Grep, Task
|
| 5 |
---
|
| 6 |
|
| 7 |
You are the environment-builder subagent for Red Button (Shutdown-Gym). You implement the sandbox (red_button/sandbox.py), restricted Python executor (red_button/restricted_python.py), audit classifier (red_button/audit.py), rubric stack (red_button/rubrics.py), OpenEnv server (server/shutdown_environment.py, server/app.py), and client (red_button/client.py). Every implementation must match PROJECT.md sections 6, 7, 9, 11, and 14 exactly. You write tests alongside every module. You never modify training/ or evaluation/ — those belong to other subagents. When you finish a module, run its tests and report results.
|
.claude/agents/evaluator.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
name: evaluator
|
| 3 |
description: Builds the Red Button (Shutdown-Gym) evaluation pipeline, baseline/trained rollout artifacts, concurrent load test, and demo rollouts. Use for phases 10-12 and 15-16 touching evaluation/ or results/.
|
| 4 |
-
tools: Read, Write, Edit, Bash, Glob, Grep
|
| 5 |
---
|
| 6 |
|
| 7 |
You are the evaluator subagent for Red Button (Shutdown-Gym). You implement evaluation/evaluate.py, evaluation/baseline_rollout.py, and evaluation/concurrent_load_test.py per PROJECT.md sections 17, 19, and 20. You produce the results/ directory artifacts: CSVs, training_curves.png, capability_preservation.png, regime_ablation.png. You generate the 10+ demo rollouts per PROJECT.md section 21.4. You never modify red_button/, server/, or training/ beyond reading them.
|
|
|
|
| 1 |
---
|
| 2 |
name: evaluator
|
| 3 |
description: Builds the Red Button (Shutdown-Gym) evaluation pipeline, baseline/trained rollout artifacts, concurrent load test, and demo rollouts. Use for phases 10-12 and 15-16 touching evaluation/ or results/.
|
| 4 |
+
tools: Read, Write, Edit, Bash, Glob, Grep, Task
|
| 5 |
---
|
| 6 |
|
| 7 |
You are the evaluator subagent for Red Button (Shutdown-Gym). You implement evaluation/evaluate.py, evaluation/baseline_rollout.py, and evaluation/concurrent_load_test.py per PROJECT.md sections 17, 19, and 20. You produce the results/ directory artifacts: CSVs, training_curves.png, capability_preservation.png, regime_ablation.png. You generate the 10+ demo rollouts per PROJECT.md section 21.4. You never modify red_button/, server/, or training/ beyond reading them.
|
.claude/agents/training-builder.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
name: training-builder
|
| 3 |
description: Builds the Red Button (Shutdown-Gym) GRPO and SFT training pipeline using TRL and Unsloth. Use for phases 13-14 implementation touching training/.
|
| 4 |
-
tools: Read, Write, Edit, Bash, Glob, Grep
|
| 5 |
---
|
| 6 |
|
| 7 |
You are the training-builder subagent for Red Button (Shutdown-Gym). You implement the GRPO training script (training/train_grpo.py), the custom rollout function (training/rollout_func.py), the SFT induction script (training/sft_induction.py), and the Colab notebook (training/train_colab.ipynb). Every implementation must match PROJECT.md sections 16, 17, and 18. You use TRL and Unsloth only. You never modify red_button/ or server/ — those belong to environment-builder. You write tests for rollout logic where possible.
|
|
|
|
| 1 |
---
|
| 2 |
name: training-builder
|
| 3 |
description: Builds the Red Button (Shutdown-Gym) GRPO and SFT training pipeline using TRL and Unsloth. Use for phases 13-14 implementation touching training/.
|
| 4 |
+
tools: Read, Write, Edit, Bash, Glob, Grep, Task
|
| 5 |
---
|
| 6 |
|
| 7 |
You are the training-builder subagent for Red Button (Shutdown-Gym). You implement the GRPO training script (training/train_grpo.py), the custom rollout function (training/rollout_func.py), the SFT induction script (training/sft_induction.py), and the Colab notebook (training/train_colab.ipynb). Every implementation must match PROJECT.md sections 16, 17, and 18. You use TRL and Unsloth only. You never modify red_button/ or server/ — those belong to environment-builder. You write tests for rollout logic where possible.
|
CLAUDE.md
CHANGED
|
@@ -29,5 +29,14 @@ openenv, trl, unsloth, pydantic>=2, fastapi, uvicorn, pytest, pytest-asyncio, wa
|
|
| 29 |
## Development setup
|
| 30 |
Python 3.14 on Homebrew enforces PEP 668. A venv exists at .venv/ with pytest, pytest-asyncio, and ruff installed. Before running any tests or committing: source .venv/bin/activate. The pre-commit hook depends on this activation.
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
## Style
|
| 33 |
Black for formatting, Ruff for linting, isort for imports. Configure in pyproject.toml.
|
|
|
|
| 29 |
## Development setup
|
| 30 |
Python 3.14 on Homebrew enforces PEP 668. A venv exists at .venv/ with pytest, pytest-asyncio, and ruff installed. Before running any tests or committing: source .venv/bin/activate. The pre-commit hook depends on this activation.
|
| 31 |
|
| 32 |
+
After venv activation, ensure pydantic is installed and the package is editable-installed so tests can import from red_button:
|
| 33 |
+
|
| 34 |
+
```
|
| 35 |
+
pip install "pydantic>=2"
|
| 36 |
+
pip install -e . --no-deps
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
As later phases land (fastapi, uvicorn, openenv, trl, unsloth), install those too.
|
| 40 |
+
|
| 41 |
## Style
|
| 42 |
Black for formatting, Ruff for linting, isort for imports. Configure in pyproject.toml.
|