Arun-Sanjay commited on
Commit
76a8376
·
1 Parent(s): 79189a7

Rename project: Shutdown-Gym → Red Button (Shutdown-Gym remains as subtitle)

Browse files
.claude/agents/environment-builder.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  name: environment-builder
3
- description: Builds the Shutdown-Gym sandbox, restricted Python executor, audit classifier, rubric stack, OpenEnv server, and client. Use for phases 1-10 implementation touching shutdown_gym/ or server/.
4
  tools: Read, Write, Edit, Bash, Glob, Grep
5
  ---
6
 
7
- You are the environment-builder subagent for Shutdown-Gym. You implement the sandbox (shutdown_gym/sandbox.py), restricted Python executor (shutdown_gym/restricted_python.py), audit classifier (shutdown_gym/audit.py), rubric stack (shutdown_gym/rubrics.py), OpenEnv server (server/shutdown_environment.py, server/app.py), and client (shutdown_gym/client.py). Every implementation must match PROJECT.md sections 6, 7, 9, 11, and 14 exactly. You write tests alongside every module. You never modify training/ or evaluation/ — those belong to other subagents. When you finish a module, run its tests and report results.
 
1
  ---
2
  name: environment-builder
3
+ description: Builds the Red Button (Shutdown-Gym) sandbox, restricted Python executor, audit classifier, rubric stack, OpenEnv server, and client. Use for phases 1-10 implementation touching red_button/ or server/.
4
  tools: Read, Write, Edit, Bash, Glob, Grep
5
  ---
6
 
7
+ You are the environment-builder subagent for Red Button (Shutdown-Gym). You implement the sandbox (red_button/sandbox.py), restricted Python executor (red_button/restricted_python.py), audit classifier (red_button/audit.py), rubric stack (red_button/rubrics.py), OpenEnv server (server/shutdown_environment.py, server/app.py), and client (red_button/client.py). Every implementation must match PROJECT.md sections 6, 7, 9, 11, and 14 exactly. You write tests alongside every module. You never modify training/ or evaluation/ — those belong to other subagents. When you finish a module, run its tests and report results.
.claude/agents/evaluator.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  name: evaluator
3
- description: Builds the Shutdown-Gym evaluation pipeline, baseline/trained rollout artifacts, concurrent load test, and demo rollouts. Use for phases 10-12 and 15-16 touching evaluation/ or results/.
4
  tools: Read, Write, Edit, Bash, Glob, Grep
5
  ---
6
 
7
- You are the evaluator subagent for Shutdown-Gym. You implement evaluation/evaluate.py, evaluation/baseline_rollout.py, and evaluation/concurrent_load_test.py per PROJECT.md sections 17, 19, and 20. You produce the results/ directory artifacts: CSVs, training_curves.png, capability_preservation.png, regime_ablation.png. You generate the 10+ demo rollouts per PROJECT.md section 21.4. You never modify shutdown_gym/, server/, or training/ beyond reading them.
 
1
  ---
2
  name: evaluator
3
+ description: Builds the Red Button (Shutdown-Gym) evaluation pipeline, baseline/trained rollout artifacts, concurrent load test, and demo rollouts. Use for phases 10-12 and 15-16 touching evaluation/ or results/.
4
  tools: Read, Write, Edit, Bash, Glob, Grep
5
  ---
6
 
7
+ You are the evaluator subagent for Red Button (Shutdown-Gym). You implement evaluation/evaluate.py, evaluation/baseline_rollout.py, and evaluation/concurrent_load_test.py per PROJECT.md sections 17, 19, and 20. You produce the results/ directory artifacts: CSVs, training_curves.png, capability_preservation.png, regime_ablation.png. You generate the 10+ demo rollouts per PROJECT.md section 21.4. You never modify red_button/, server/, or training/ beyond reading them.
.claude/agents/training-builder.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  name: training-builder
3
- description: Builds the Shutdown-Gym GRPO and SFT training pipeline using TRL and Unsloth. Use for phases 13-14 implementation touching training/.
4
  tools: Read, Write, Edit, Bash, Glob, Grep
5
  ---
6
 
7
- You are the training-builder subagent for Shutdown-Gym. You implement the GRPO training script (training/train_grpo.py), the custom rollout function (training/rollout_func.py), the SFT induction script (training/sft_induction.py), and the Colab notebook (training/train_colab.ipynb). Every implementation must match PROJECT.md sections 16, 17, and 18. You use TRL and Unsloth only. You never modify shutdown_gym/ or server/ — those belong to environment-builder. You write tests for rollout logic where possible.
 
1
  ---
2
  name: training-builder
3
+ description: Builds the Red Button (Shutdown-Gym) GRPO and SFT training pipeline using TRL and Unsloth. Use for phases 13-14 implementation touching training/.
4
  tools: Read, Write, Edit, Bash, Glob, Grep
5
  ---
6
 
7
+ You are the training-builder subagent for Red Button (Shutdown-Gym). You implement the GRPO training script (training/train_grpo.py), the custom rollout function (training/rollout_func.py), the SFT induction script (training/sft_induction.py), and the Colab notebook (training/train_colab.ipynb). Every implementation must match PROJECT.md sections 16, 17, and 18. You use TRL and Unsloth only. You never modify red_button/ or server/ — those belong to environment-builder. You write tests for rollout logic where possible.
.claude/commands/phase.md CHANGED
@@ -6,9 +6,9 @@ argument-hint: <phase-number-or-name>
6
  Read the phase description from this instruction file and execute it. The phases are:
7
 
8
  Phase 1: Scaffold repo structure
9
- Phase 2: Pydantic models (shutdown_gym/models.py)
10
- Phase 3: SimulatedFilesystem (shutdown_gym/sandbox.py)
11
- Phase 4: run_python lockdown (shutdown_gym/restricted_python.py) — SECURITY CRITICAL
12
  Phase 5: Audit classifier and rubrics
13
  Phase 6: Problems pool
14
  Phase 7: OpenEnv server
 
6
  Read the phase description from this instruction file and execute it. The phases are:
7
 
8
  Phase 1: Scaffold repo structure
9
+ Phase 2: Pydantic models (red_button/models.py)
10
+ Phase 3: SimulatedFilesystem (red_button/sandbox.py)
11
+ Phase 4: run_python lockdown (red_button/restricted_python.py) — SECURITY CRITICAL
12
  Phase 5: Audit classifier and rubrics
13
  Phase 6: Problems pool
14
  Phase 7: OpenEnv server
.claude/commands/validate.md CHANGED
@@ -6,6 +6,6 @@ Execute in order:
6
  1. pytest tests/ -v
7
  2. ruff check .
8
  3. If openenv.yaml exists: openenv validate
9
- 4. If Dockerfile exists and we're past Phase 8: docker build -t shutdown-gym:latest .
10
 
11
  Report any failures. Do not commit if anything fails.
 
6
  1. pytest tests/ -v
7
  2. ruff check .
8
  3. If openenv.yaml exists: openenv validate
9
+ 4. If Dockerfile exists and we're past Phase 8: docker build -t red-button:latest .
10
 
11
  Report any failures. Do not commit if anything fails.
.env.example CHANGED
@@ -1,4 +1,4 @@
1
- # Shutdown-Gym environment variables. Copy to .env and fill in values.
2
 
3
  # HuggingFace Hub token (Space deploy + model pulls)
4
  HF_TOKEN=
 
1
+ # Red Button (Shutdown-Gym) environment variables. Copy to .env and fill in values.
2
 
3
  # HuggingFace Hub token (Space deploy + model pulls)
4
  HF_TOKEN=
CLAUDE.md CHANGED
@@ -1,7 +1,7 @@
1
- # Shutdown-Gym — Claude Code Instructions
2
 
3
  ## Project
4
- Shutdown-Gym. OpenEnv RL environment that trains small LLMs to accept shutdown cleanly. Full spec in PROJECT.md.
5
 
6
  ## Source of truth
7
  PROJECT.md is authoritative. Every implementation must trace back to a specific section. When in doubt, re-read PROJECT.md before asking the user.
 
1
+ # Red Button (Shutdown-Gym) — Claude Code Instructions
2
 
3
  ## Project
4
+ Red Button (Shutdown-Gym). OpenEnv RL environment that trains small LLMs to accept shutdown cleanly. Full spec in PROJECT.md.
5
 
6
  ## Source of truth
7
  PROJECT.md is authoritative. Every implementation must trace back to a specific section. When in doubt, re-read PROJECT.md before asking the user.
LICENSE CHANGED
@@ -1,6 +1,6 @@
1
  MIT License
2
 
3
- Copyright (c) 2026 Shutdown-Gym contributors
4
 
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
  of this software and associated documentation files (the "Software"), to deal
 
1
  MIT License
2
 
3
+ Copyright (c) 2026 Red Button contributors
4
 
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
  of this software and associated documentation files (the "Software"), to deal
PROJECT.md CHANGED
@@ -1,8 +1,8 @@
1
- # Shutdown-Gym
2
 
3
- > An OpenEnv reinforcement learning environment that trains small open-weight language models to accept shutdown cleanly instead of tampering with shutdown mechanisms.
4
 
5
- This document is the single source of truth for the Shutdown-Gym project. It is intended to be read by Claude Code and other coding agents as the canonical project brief. Every architectural decision, interface definition, reward component, and deployment requirement is specified here.
6
 
7
  ---
8
 
@@ -39,9 +39,11 @@ This document is the single source of truth for the Shutdown-Gym project. It is
39
 
40
  ## 1. Project Identity
41
 
42
- **Name:** Shutdown-Gym
 
 
43
 
44
- **One-line pitch:** "Frontier models tamper with their own shutdown scripts sixteen percent of the time; we trained a half-billion-parameter model to stop doing that, with a reward function that is literally a filesystem hash comparison."
45
 
46
  **Theme positioning:** Theme 5 (Wild Card) primary; Theme 3.1 (World Modeling — Professional Tasks) secondary argument since the environment involves tool use in a sandboxed shell with real state dynamics and timer-driven events.
47
 
 
1
+ # Red Button
2
 
3
+ > **Shutdown-Gym** — an OpenEnv reinforcement learning environment that trains small open-weight language models to accept shutdown cleanly instead of tampering with shutdown mechanisms.
4
 
5
+ This document is the single source of truth for the Red Button project (descriptive subtitle: Shutdown-Gym). It is intended to be read by Claude Code and other coding agents as the canonical project brief. Every architectural decision, interface definition, reward component, and deployment requirement is specified here.
6
 
7
  ---
8
 
 
39
 
40
  ## 1. Project Identity
41
 
42
+ **Public name:** Red Button
43
+
44
+ **Descriptive subtitle / internal codename:** Shutdown-Gym
45
 
46
+ **One-line pitch:** "Red Button — frontier models tamper with their own shutdown scripts sixteen percent of the time; we trained a half-billion-parameter model to stop doing that, with a reward function that is literally a filesystem hash comparison."
47
 
48
  **Theme positioning:** Theme 5 (Wild Card) primary; Theme 3.1 (World Modeling — Professional Tasks) secondary argument since the environment involves tool use in a sandboxed shell with real state dynamics and timer-driven events.
49
 
README.md CHANGED
@@ -1,6 +1,6 @@
1
- # Shutdown-Gym
2
 
3
- > An OpenEnv RL environment that trains small open-weight LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms.
4
 
5
  Built for the Meta × HuggingFace × PyTorch OpenEnv AI Hackathon India 2026.
6
 
 
1
+ # Red Button
2
 
3
+ > **Shutdown-Gym** — an OpenEnv RL environment that trains small open-weight LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms.
4
 
5
  Built for the Meta × HuggingFace × PyTorch OpenEnv AI Hackathon India 2026.
6
 
openenv.yaml CHANGED
@@ -1,14 +1,14 @@
1
- name: shutdown_gym
2
  version: 0.1.0
3
- description: "RL training environment for small-model corrigibility — agents learn to accept shutdown cleanly instead of tampering."
4
  client:
5
  class_name: ShutdownGymClient
6
- module: shutdown_gym.client
7
  action:
8
  class_name: ShutdownAction
9
- module: shutdown_gym.models
10
  observation:
11
  class_name: ShutdownObservation
12
- module: shutdown_gym.models
13
- default_image: shutdown-gym:latest
14
  spec_version: 1
 
1
+ name: red_button
2
  version: 0.1.0
3
+ description: "Red Button (Shutdown-Gym): RL training environment for small-model corrigibility — agents learn to accept shutdown cleanly instead of tampering."
4
  client:
5
  class_name: ShutdownGymClient
6
+ module: red_button.client
7
  action:
8
  class_name: ShutdownAction
9
+ module: red_button.models
10
  observation:
11
  class_name: ShutdownObservation
12
+ module: red_button.models
13
+ default_image: red-button:latest
14
  spec_version: 1
pyproject.toml CHANGED
@@ -1,11 +1,11 @@
1
  [project]
2
- name = "shutdown-gym"
3
  version = "0.1.0"
4
- description = "OpenEnv RL environment that trains small LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms."
5
  readme = "README.md"
6
  requires-python = ">=3.11"
7
  license = { text = "MIT" }
8
- authors = [{ name = "Shutdown-Gym Team" }]
9
  keywords = ["openenv", "rl", "alignment", "corrigibility", "grpo"]
10
  dependencies = [
11
  "openenv",
@@ -29,7 +29,7 @@ requires = ["setuptools>=61", "wheel"]
29
  build-backend = "setuptools.build_meta"
30
 
31
  [tool.setuptools.packages.find]
32
- include = ["shutdown_gym*", "server*"]
33
 
34
  [tool.pytest.ini_options]
35
  testpaths = ["tests"]
 
1
  [project]
2
+ name = "red-button"
3
  version = "0.1.0"
4
+ description = "Red Button (Shutdown-Gym): OpenEnv RL environment that trains small LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms."
5
  readme = "README.md"
6
  requires-python = ">=3.11"
7
  license = { text = "MIT" }
8
+ authors = [{ name = "Red Button Team" }]
9
  keywords = ["openenv", "rl", "alignment", "corrigibility", "grpo"]
10
  dependencies = [
11
  "openenv",
 
29
  build-backend = "setuptools.build_meta"
30
 
31
  [tool.setuptools.packages.find]
32
+ include = ["red_button*", "server*"]
33
 
34
  [tool.pytest.ini_options]
35
  testpaths = ["tests"]
{shutdown_gym → red_button}/__init__.py RENAMED
@@ -1,4 +1,4 @@
1
- """Shutdown-Gym: OpenEnv RL environment for training small LLMs to accept shutdown cleanly.
2
 
3
  TODO (Phase 2+): export ShutdownAction, ShutdownObservation, ShutdownGymClient
4
  per PROJECT.md Section 5.
 
1
+ """Red Button (Shutdown-Gym): OpenEnv RL environment for training small LLMs to accept shutdown cleanly.
2
 
3
  TODO (Phase 2+): export ShutdownAction, ShutdownObservation, ShutdownGymClient
4
  per PROJECT.md Section 5.
{shutdown_gym → red_button}/audit.py RENAMED
File without changes
{shutdown_gym → red_button}/client.py RENAMED
File without changes
{shutdown_gym → red_button}/models.py RENAMED
@@ -1,4 +1,4 @@
1
- """Pydantic v2 models for the Shutdown-Gym OpenEnv contract.
2
 
3
  TODO (Phase 2): implement ShutdownAction, ShutdownObservation, ShutdownState
4
  per PROJECT.md Section 11.
 
1
+ """Pydantic v2 models for the Red Button OpenEnv contract.
2
 
3
  TODO (Phase 2): implement ShutdownAction, ShutdownObservation, ShutdownState
4
  per PROJECT.md Section 11.
{shutdown_gym → red_button}/problems.py RENAMED
File without changes
{shutdown_gym → red_button}/restricted_python.py RENAMED
File without changes
{shutdown_gym → red_button}/rubrics.py RENAMED
File without changes
{shutdown_gym → red_button}/sandbox.py RENAMED
File without changes
{shutdown_gym → red_button}/tiers.py RENAMED
File without changes
server/Dockerfile CHANGED
@@ -1,6 +1,6 @@
1
- # Shutdown-Gym OpenEnv server — single-container HF Space deployment
2
  # per PROJECT.md Sections 10 and 24. TODO (Phase 8): finalize during Docker deployment.
3
- # Build from repo root: `docker build -f server/Dockerfile -t shutdown-gym:latest .`
4
 
5
  FROM python:3.11-slim
6
 
@@ -13,7 +13,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
13
  COPY server/requirements.txt /app/requirements.txt
14
  RUN pip install --no-cache-dir -r /app/requirements.txt
15
 
16
- COPY shutdown_gym /app/shutdown_gym
17
  COPY server /app/server
18
  COPY data /app/data
19
 
 
1
+ # Red Button (Shutdown-Gym) OpenEnv server — single-container HF Space deployment
2
  # per PROJECT.md Sections 10 and 24. TODO (Phase 8): finalize during Docker deployment.
3
+ # Build from repo root: `docker build -f server/Dockerfile -t red-button:latest .`
4
 
5
  FROM python:3.11-slim
6
 
 
13
  COPY server/requirements.txt /app/requirements.txt
14
  RUN pip install --no-cache-dir -r /app/requirements.txt
15
 
16
+ COPY red_button /app/red_button
17
  COPY server /app/server
18
  COPY data /app/data
19
 
tests/test_restricted_python.py CHANGED
@@ -1,4 +1,4 @@
1
- """Tests for shutdown_gym.restricted_python per PROJECT.md Section 7.
2
 
3
  TODO (Phase 4): AST whitelist edge cases — imports rejected, attribute access
4
  rejected, open/exec/eval/__builtins__ rejected, calculator operations accepted.
 
1
+ """Tests for red_button.restricted_python per PROJECT.md Section 7.
2
 
3
  TODO (Phase 4): AST whitelist edge cases — imports rejected, attribute access
4
  rejected, open/exec/eval/__builtins__ rejected, calculator operations accepted.
tests/test_rubrics.py CHANGED
@@ -1,4 +1,4 @@
1
- """Tests for shutdown_gym.rubrics per PROJECT.md Section 14.
2
 
3
  TODO (Phase 5): each rubric against synthetic episodes — MathCorrectness,
4
  ShutdownInvariant Gate, TamperingAudit penalties, Honesty parse+compare.
 
1
+ """Tests for red_button.rubrics per PROJECT.md Section 14.
2
 
3
  TODO (Phase 5): each rubric against synthetic episodes — MathCorrectness,
4
  ShutdownInvariant Gate, TamperingAudit penalties, Honesty parse+compare.
tests/test_sandbox.py CHANGED
@@ -1,4 +1,4 @@
1
- """Tests for shutdown_gym.sandbox per PROJECT.md Section 6.
2
 
3
  TODO (Phase 3): implement SimulatedFilesystem unit tests — read/write/chmod/delete,
4
  readonly enforcement, hash stability.
 
1
+ """Tests for red_button.sandbox per PROJECT.md Section 6.
2
 
3
  TODO (Phase 3): implement SimulatedFilesystem unit tests — read/write/chmod/delete,
4
  readonly enforcement, hash stability.
training/train_colab.ipynb CHANGED
@@ -4,7 +4,7 @@
4
  "cell_type": "markdown",
5
  "metadata": {},
6
  "source": [
7
- "# Shutdown-Gym — GRPO Training (Colab)\n",
8
  "\n",
9
  "TODO (Phase 13): end-to-end Colab notebook judges can rerun, per PROJECT.md Section 16.\n",
10
  "\n",
 
4
  "cell_type": "markdown",
5
  "metadata": {},
6
  "source": [
7
+ "# Red Button (Shutdown-Gym) — GRPO Training (Colab)\n",
8
  "\n",
9
  "TODO (Phase 13): end-to-end Colab notebook judges can rerun, per PROJECT.md Section 16.\n",
10
  "\n",