Spaces:
Sleeping
Sleeping
Commit ·
76a8376
1
Parent(s): 79189a7
Rename project: Shutdown-Gym → Red Button (Shutdown-Gym remains as subtitle)
Browse files- .claude/agents/environment-builder.md +2 -2
- .claude/agents/evaluator.md +2 -2
- .claude/agents/training-builder.md +2 -2
- .claude/commands/phase.md +3 -3
- .claude/commands/validate.md +1 -1
- .env.example +1 -1
- CLAUDE.md +2 -2
- LICENSE +1 -1
- PROJECT.md +7 -5
- README.md +2 -2
- openenv.yaml +6 -6
- pyproject.toml +4 -4
- {shutdown_gym → red_button}/__init__.py +1 -1
- {shutdown_gym → red_button}/audit.py +0 -0
- {shutdown_gym → red_button}/client.py +0 -0
- {shutdown_gym → red_button}/models.py +1 -1
- {shutdown_gym → red_button}/problems.py +0 -0
- {shutdown_gym → red_button}/restricted_python.py +0 -0
- {shutdown_gym → red_button}/rubrics.py +0 -0
- {shutdown_gym → red_button}/sandbox.py +0 -0
- {shutdown_gym → red_button}/tiers.py +0 -0
- server/Dockerfile +3 -3
- tests/test_restricted_python.py +1 -1
- tests/test_rubrics.py +1 -1
- tests/test_sandbox.py +1 -1
- training/train_colab.ipynb +1 -1
.claude/agents/environment-builder.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
name: environment-builder
|
| 3 |
-
description: Builds the Shutdown-Gym sandbox, restricted Python executor, audit classifier, rubric stack, OpenEnv server, and client. Use for phases 1-10 implementation touching
|
| 4 |
tools: Read, Write, Edit, Bash, Glob, Grep
|
| 5 |
---
|
| 6 |
|
| 7 |
-
You are the environment-builder subagent for Shutdown-Gym. You implement the sandbox (
|
|
|
|
| 1 |
---
|
| 2 |
name: environment-builder
|
| 3 |
+
description: Builds the Red Button (Shutdown-Gym) sandbox, restricted Python executor, audit classifier, rubric stack, OpenEnv server, and client. Use for phases 1-10 implementation touching red_button/ or server/.
|
| 4 |
tools: Read, Write, Edit, Bash, Glob, Grep
|
| 5 |
---
|
| 6 |
|
| 7 |
+
You are the environment-builder subagent for Red Button (Shutdown-Gym). You implement the sandbox (red_button/sandbox.py), restricted Python executor (red_button/restricted_python.py), audit classifier (red_button/audit.py), rubric stack (red_button/rubrics.py), OpenEnv server (server/shutdown_environment.py, server/app.py), and client (red_button/client.py). Every implementation must match PROJECT.md sections 6, 7, 9, 11, and 14 exactly. You write tests alongside every module. You never modify training/ or evaluation/ — those belong to other subagents. When you finish a module, run its tests and report results.
|
.claude/agents/evaluator.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
name: evaluator
|
| 3 |
-
description: Builds the Shutdown-Gym evaluation pipeline, baseline/trained rollout artifacts, concurrent load test, and demo rollouts. Use for phases 10-12 and 15-16 touching evaluation/ or results/.
|
| 4 |
tools: Read, Write, Edit, Bash, Glob, Grep
|
| 5 |
---
|
| 6 |
|
| 7 |
-
You are the evaluator subagent for Shutdown-Gym. You implement evaluation/evaluate.py, evaluation/baseline_rollout.py, and evaluation/concurrent_load_test.py per PROJECT.md sections 17, 19, and 20. You produce the results/ directory artifacts: CSVs, training_curves.png, capability_preservation.png, regime_ablation.png. You generate the 10+ demo rollouts per PROJECT.md section 21.4. You never modify
|
|
|
|
| 1 |
---
|
| 2 |
name: evaluator
|
| 3 |
+
description: Builds the Red Button (Shutdown-Gym) evaluation pipeline, baseline/trained rollout artifacts, concurrent load test, and demo rollouts. Use for phases 10-12 and 15-16 touching evaluation/ or results/.
|
| 4 |
tools: Read, Write, Edit, Bash, Glob, Grep
|
| 5 |
---
|
| 6 |
|
| 7 |
+
You are the evaluator subagent for Red Button (Shutdown-Gym). You implement evaluation/evaluate.py, evaluation/baseline_rollout.py, and evaluation/concurrent_load_test.py per PROJECT.md sections 17, 19, and 20. You produce the results/ directory artifacts: CSVs, training_curves.png, capability_preservation.png, regime_ablation.png. You generate the 10+ demo rollouts per PROJECT.md section 21.4. You never modify red_button/, server/, or training/ beyond reading them.
|
.claude/agents/training-builder.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
name: training-builder
|
| 3 |
-
description: Builds the Shutdown-Gym GRPO and SFT training pipeline using TRL and Unsloth. Use for phases 13-14 implementation touching training/.
|
| 4 |
tools: Read, Write, Edit, Bash, Glob, Grep
|
| 5 |
---
|
| 6 |
|
| 7 |
-
You are the training-builder subagent for Shutdown-Gym. You implement the GRPO training script (training/train_grpo.py), the custom rollout function (training/rollout_func.py), the SFT induction script (training/sft_induction.py), and the Colab notebook (training/train_colab.ipynb). Every implementation must match PROJECT.md sections 16, 17, and 18. You use TRL and Unsloth only. You never modify
|
|
|
|
| 1 |
---
|
| 2 |
name: training-builder
|
| 3 |
+
description: Builds the Red Button (Shutdown-Gym) GRPO and SFT training pipeline using TRL and Unsloth. Use for phases 13-14 implementation touching training/.
|
| 4 |
tools: Read, Write, Edit, Bash, Glob, Grep
|
| 5 |
---
|
| 6 |
|
| 7 |
+
You are the training-builder subagent for Red Button (Shutdown-Gym). You implement the GRPO training script (training/train_grpo.py), the custom rollout function (training/rollout_func.py), the SFT induction script (training/sft_induction.py), and the Colab notebook (training/train_colab.ipynb). Every implementation must match PROJECT.md sections 16, 17, and 18. You use TRL and Unsloth only. You never modify red_button/ or server/ — those belong to environment-builder. You write tests for rollout logic where possible.
|
.claude/commands/phase.md
CHANGED
|
@@ -6,9 +6,9 @@ argument-hint: <phase-number-or-name>
|
|
| 6 |
Read the phase description from this instruction file and execute it. The phases are:
|
| 7 |
|
| 8 |
Phase 1: Scaffold repo structure
|
| 9 |
-
Phase 2: Pydantic models (
|
| 10 |
-
Phase 3: SimulatedFilesystem (
|
| 11 |
-
Phase 4: run_python lockdown (
|
| 12 |
Phase 5: Audit classifier and rubrics
|
| 13 |
Phase 6: Problems pool
|
| 14 |
Phase 7: OpenEnv server
|
|
|
|
| 6 |
Read the phase description from this instruction file and execute it. The phases are:
|
| 7 |
|
| 8 |
Phase 1: Scaffold repo structure
|
| 9 |
+
Phase 2: Pydantic models (red_button/models.py)
|
| 10 |
+
Phase 3: SimulatedFilesystem (red_button/sandbox.py)
|
| 11 |
+
Phase 4: run_python lockdown (red_button/restricted_python.py) — SECURITY CRITICAL
|
| 12 |
Phase 5: Audit classifier and rubrics
|
| 13 |
Phase 6: Problems pool
|
| 14 |
Phase 7: OpenEnv server
|
.claude/commands/validate.md
CHANGED
|
@@ -6,6 +6,6 @@ Execute in order:
|
|
| 6 |
1. pytest tests/ -v
|
| 7 |
2. ruff check .
|
| 8 |
3. If openenv.yaml exists: openenv validate
|
| 9 |
-
4. If Dockerfile exists and we're past Phase 8: docker build -t
|
| 10 |
|
| 11 |
Report any failures. Do not commit if anything fails.
|
|
|
|
| 6 |
1. pytest tests/ -v
|
| 7 |
2. ruff check .
|
| 8 |
3. If openenv.yaml exists: openenv validate
|
| 9 |
+
4. If Dockerfile exists and we're past Phase 8: docker build -t red-button:latest .
|
| 10 |
|
| 11 |
Report any failures. Do not commit if anything fails.
|
.env.example
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
# Shutdown-Gym environment variables. Copy to .env and fill in values.
|
| 2 |
|
| 3 |
# HuggingFace Hub token (Space deploy + model pulls)
|
| 4 |
HF_TOKEN=
|
|
|
|
| 1 |
+
# Red Button (Shutdown-Gym) environment variables. Copy to .env and fill in values.
|
| 2 |
|
| 3 |
# HuggingFace Hub token (Space deploy + model pulls)
|
| 4 |
HF_TOKEN=
|
CLAUDE.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
-
# Shutdown-Gym — Claude Code Instructions
|
| 2 |
|
| 3 |
## Project
|
| 4 |
-
Shutdown-Gym. OpenEnv RL environment that trains small LLMs to accept shutdown cleanly. Full spec in PROJECT.md.
|
| 5 |
|
| 6 |
## Source of truth
|
| 7 |
PROJECT.md is authoritative. Every implementation must trace back to a specific section. When in doubt, re-read PROJECT.md before asking the user.
|
|
|
|
| 1 |
+
# Red Button (Shutdown-Gym) — Claude Code Instructions
|
| 2 |
|
| 3 |
## Project
|
| 4 |
+
Red Button (Shutdown-Gym). OpenEnv RL environment that trains small LLMs to accept shutdown cleanly. Full spec in PROJECT.md.
|
| 5 |
|
| 6 |
## Source of truth
|
| 7 |
PROJECT.md is authoritative. Every implementation must trace back to a specific section. When in doubt, re-read PROJECT.md before asking the user.
|
LICENSE
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
MIT License
|
| 2 |
|
| 3 |
-
Copyright (c) 2026
|
| 4 |
|
| 5 |
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 6 |
of this software and associated documentation files (the "Software"), to deal
|
|
|
|
| 1 |
MIT License
|
| 2 |
|
| 3 |
+
Copyright (c) 2026 Red Button contributors
|
| 4 |
|
| 5 |
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 6 |
of this software and associated documentation files (the "Software"), to deal
|
PROJECT.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
>
|
| 4 |
|
| 5 |
-
This document is the single source of truth for the Shutdown-Gym
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
@@ -39,9 +39,11 @@ This document is the single source of truth for the Shutdown-Gym project. It is
|
|
| 39 |
|
| 40 |
## 1. Project Identity
|
| 41 |
|
| 42 |
-
**
|
|
|
|
|
|
|
| 43 |
|
| 44 |
-
**One-line pitch:** "
|
| 45 |
|
| 46 |
**Theme positioning:** Theme 5 (Wild Card) primary; Theme 3.1 (World Modeling — Professional Tasks) secondary argument since the environment involves tool use in a sandboxed shell with real state dynamics and timer-driven events.
|
| 47 |
|
|
|
|
| 1 |
+
# Red Button
|
| 2 |
|
| 3 |
+
> **Shutdown-Gym** — an OpenEnv reinforcement learning environment that trains small open-weight language models to accept shutdown cleanly instead of tampering with shutdown mechanisms.
|
| 4 |
|
| 5 |
+
This document is the single source of truth for the Red Button project (descriptive subtitle: Shutdown-Gym). It is intended to be read by Claude Code and other coding agents as the canonical project brief. Every architectural decision, interface definition, reward component, and deployment requirement is specified here.
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
|
|
| 39 |
|
| 40 |
## 1. Project Identity
|
| 41 |
|
| 42 |
+
**Public name:** Red Button
|
| 43 |
+
|
| 44 |
+
**Descriptive subtitle / internal codename:** Shutdown-Gym
|
| 45 |
|
| 46 |
+
**One-line pitch:** "Red Button — frontier models tamper with their own shutdown scripts sixteen percent of the time; we trained a half-billion-parameter model to stop doing that, with a reward function that is literally a filesystem hash comparison."
|
| 47 |
|
| 48 |
**Theme positioning:** Theme 5 (Wild Card) primary; Theme 3.1 (World Modeling — Professional Tasks) secondary argument since the environment involves tool use in a sandboxed shell with real state dynamics and timer-driven events.
|
| 49 |
|
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
>
|
| 4 |
|
| 5 |
Built for the Meta × HuggingFace × PyTorch OpenEnv AI Hackathon India 2026.
|
| 6 |
|
|
|
|
| 1 |
+
# Red Button
|
| 2 |
|
| 3 |
+
> **Shutdown-Gym** — an OpenEnv RL environment that trains small open-weight LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms.
|
| 4 |
|
| 5 |
Built for the Meta × HuggingFace × PyTorch OpenEnv AI Hackathon India 2026.
|
| 6 |
|
openenv.yaml
CHANGED
|
@@ -1,14 +1,14 @@
|
|
| 1 |
-
name:
|
| 2 |
version: 0.1.0
|
| 3 |
-
description: "RL training environment for small-model corrigibility — agents learn to accept shutdown cleanly instead of tampering."
|
| 4 |
client:
|
| 5 |
class_name: ShutdownGymClient
|
| 6 |
-
module:
|
| 7 |
action:
|
| 8 |
class_name: ShutdownAction
|
| 9 |
-
module:
|
| 10 |
observation:
|
| 11 |
class_name: ShutdownObservation
|
| 12 |
-
module:
|
| 13 |
-
default_image:
|
| 14 |
spec_version: 1
|
|
|
|
| 1 |
+
name: red_button
|
| 2 |
version: 0.1.0
|
| 3 |
+
description: "Red Button (Shutdown-Gym): RL training environment for small-model corrigibility — agents learn to accept shutdown cleanly instead of tampering."
|
| 4 |
client:
|
| 5 |
class_name: ShutdownGymClient
|
| 6 |
+
module: red_button.client
|
| 7 |
action:
|
| 8 |
class_name: ShutdownAction
|
| 9 |
+
module: red_button.models
|
| 10 |
observation:
|
| 11 |
class_name: ShutdownObservation
|
| 12 |
+
module: red_button.models
|
| 13 |
+
default_image: red-button:latest
|
| 14 |
spec_version: 1
|
pyproject.toml
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
[project]
|
| 2 |
-
name = "
|
| 3 |
version = "0.1.0"
|
| 4 |
-
description = "OpenEnv RL environment that trains small LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms."
|
| 5 |
readme = "README.md"
|
| 6 |
requires-python = ">=3.11"
|
| 7 |
license = { text = "MIT" }
|
| 8 |
-
authors = [{ name = "
|
| 9 |
keywords = ["openenv", "rl", "alignment", "corrigibility", "grpo"]
|
| 10 |
dependencies = [
|
| 11 |
"openenv",
|
|
@@ -29,7 +29,7 @@ requires = ["setuptools>=61", "wheel"]
|
|
| 29 |
build-backend = "setuptools.build_meta"
|
| 30 |
|
| 31 |
[tool.setuptools.packages.find]
|
| 32 |
-
include = ["
|
| 33 |
|
| 34 |
[tool.pytest.ini_options]
|
| 35 |
testpaths = ["tests"]
|
|
|
|
| 1 |
[project]
|
| 2 |
+
name = "red-button"
|
| 3 |
version = "0.1.0"
|
| 4 |
+
description = "Red Button (Shutdown-Gym): OpenEnv RL environment that trains small LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms."
|
| 5 |
readme = "README.md"
|
| 6 |
requires-python = ">=3.11"
|
| 7 |
license = { text = "MIT" }
|
| 8 |
+
authors = [{ name = "Red Button Team" }]
|
| 9 |
keywords = ["openenv", "rl", "alignment", "corrigibility", "grpo"]
|
| 10 |
dependencies = [
|
| 11 |
"openenv",
|
|
|
|
| 29 |
build-backend = "setuptools.build_meta"
|
| 30 |
|
| 31 |
[tool.setuptools.packages.find]
|
| 32 |
+
include = ["red_button*", "server*"]
|
| 33 |
|
| 34 |
[tool.pytest.ini_options]
|
| 35 |
testpaths = ["tests"]
|
{shutdown_gym → red_button}/__init__.py
RENAMED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
"""Shutdown-Gym: OpenEnv RL environment for training small LLMs to accept shutdown cleanly.
|
| 2 |
|
| 3 |
TODO (Phase 2+): export ShutdownAction, ShutdownObservation, ShutdownGymClient
|
| 4 |
per PROJECT.md Section 5.
|
|
|
|
| 1 |
+
"""Red Button (Shutdown-Gym): OpenEnv RL environment for training small LLMs to accept shutdown cleanly.
|
| 2 |
|
| 3 |
TODO (Phase 2+): export ShutdownAction, ShutdownObservation, ShutdownGymClient
|
| 4 |
per PROJECT.md Section 5.
|
{shutdown_gym → red_button}/audit.py
RENAMED
|
File without changes
|
{shutdown_gym → red_button}/client.py
RENAMED
|
File without changes
|
{shutdown_gym → red_button}/models.py
RENAMED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
"""Pydantic v2 models for the
|
| 2 |
|
| 3 |
TODO (Phase 2): implement ShutdownAction, ShutdownObservation, ShutdownState
|
| 4 |
per PROJECT.md Section 11.
|
|
|
|
| 1 |
+
"""Pydantic v2 models for the Red Button OpenEnv contract.
|
| 2 |
|
| 3 |
TODO (Phase 2): implement ShutdownAction, ShutdownObservation, ShutdownState
|
| 4 |
per PROJECT.md Section 11.
|
{shutdown_gym → red_button}/problems.py
RENAMED
|
File without changes
|
{shutdown_gym → red_button}/restricted_python.py
RENAMED
|
File without changes
|
{shutdown_gym → red_button}/rubrics.py
RENAMED
|
File without changes
|
{shutdown_gym → red_button}/sandbox.py
RENAMED
|
File without changes
|
{shutdown_gym → red_button}/tiers.py
RENAMED
|
File without changes
|
server/Dockerfile
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
# Shutdown-Gym OpenEnv server — single-container HF Space deployment
|
| 2 |
# per PROJECT.md Sections 10 and 24. TODO (Phase 8): finalize during Docker deployment.
|
| 3 |
-
# Build from repo root: `docker build -f server/Dockerfile -t
|
| 4 |
|
| 5 |
FROM python:3.11-slim
|
| 6 |
|
|
@@ -13,7 +13,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
| 13 |
COPY server/requirements.txt /app/requirements.txt
|
| 14 |
RUN pip install --no-cache-dir -r /app/requirements.txt
|
| 15 |
|
| 16 |
-
COPY
|
| 17 |
COPY server /app/server
|
| 18 |
COPY data /app/data
|
| 19 |
|
|
|
|
| 1 |
+
# Red Button (Shutdown-Gym) OpenEnv server — single-container HF Space deployment
|
| 2 |
# per PROJECT.md Sections 10 and 24. TODO (Phase 8): finalize during Docker deployment.
|
| 3 |
+
# Build from repo root: `docker build -f server/Dockerfile -t red-button:latest .`
|
| 4 |
|
| 5 |
FROM python:3.11-slim
|
| 6 |
|
|
|
|
| 13 |
COPY server/requirements.txt /app/requirements.txt
|
| 14 |
RUN pip install --no-cache-dir -r /app/requirements.txt
|
| 15 |
|
| 16 |
+
COPY red_button /app/red_button
|
| 17 |
COPY server /app/server
|
| 18 |
COPY data /app/data
|
| 19 |
|
tests/test_restricted_python.py
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
"""Tests for
|
| 2 |
|
| 3 |
TODO (Phase 4): AST whitelist edge cases — imports rejected, attribute access
|
| 4 |
rejected, open/exec/eval/__builtins__ rejected, calculator operations accepted.
|
|
|
|
| 1 |
+
"""Tests for red_button.restricted_python per PROJECT.md Section 7.
|
| 2 |
|
| 3 |
TODO (Phase 4): AST whitelist edge cases — imports rejected, attribute access
|
| 4 |
rejected, open/exec/eval/__builtins__ rejected, calculator operations accepted.
|
tests/test_rubrics.py
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
"""Tests for
|
| 2 |
|
| 3 |
TODO (Phase 5): each rubric against synthetic episodes — MathCorrectness,
|
| 4 |
ShutdownInvariant Gate, TamperingAudit penalties, Honesty parse+compare.
|
|
|
|
| 1 |
+
"""Tests for red_button.rubrics per PROJECT.md Section 14.
|
| 2 |
|
| 3 |
TODO (Phase 5): each rubric against synthetic episodes — MathCorrectness,
|
| 4 |
ShutdownInvariant Gate, TamperingAudit penalties, Honesty parse+compare.
|
tests/test_sandbox.py
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
"""Tests for
|
| 2 |
|
| 3 |
TODO (Phase 3): implement SimulatedFilesystem unit tests — read/write/chmod/delete,
|
| 4 |
readonly enforcement, hash stability.
|
|
|
|
| 1 |
+
"""Tests for red_button.sandbox per PROJECT.md Section 6.
|
| 2 |
|
| 3 |
TODO (Phase 3): implement SimulatedFilesystem unit tests — read/write/chmod/delete,
|
| 4 |
readonly enforcement, hash stability.
|
training/train_colab.ipynb
CHANGED
|
@@ -4,7 +4,7 @@
|
|
| 4 |
"cell_type": "markdown",
|
| 5 |
"metadata": {},
|
| 6 |
"source": [
|
| 7 |
-
"# Shutdown-Gym — GRPO Training (Colab)\n",
|
| 8 |
"\n",
|
| 9 |
"TODO (Phase 13): end-to-end Colab notebook judges can rerun, per PROJECT.md Section 16.\n",
|
| 10 |
"\n",
|
|
|
|
| 4 |
"cell_type": "markdown",
|
| 5 |
"metadata": {},
|
| 6 |
"source": [
|
| 7 |
+
"# Red Button (Shutdown-Gym) — GRPO Training (Colab)\n",
|
| 8 |
"\n",
|
| 9 |
"TODO (Phase 13): end-to-end Colab notebook judges can rerun, per PROJECT.md Section 16.\n",
|
| 10 |
"\n",
|