Spaces:

Arun-Sanjay
/

Red-Button

Sleeping

App Files Files Community

Arun-Sanjay commited on Apr 24

Commit

76a8376

1 Parent(s): 79189a7

Rename project: Shutdown-Gym → Red Button (Shutdown-Gym remains as subtitle)

Browse files

Files changed (26) hide show

.claude/agents/environment-builder.md +2 -2
.claude/agents/evaluator.md +2 -2
.claude/agents/training-builder.md +2 -2
.claude/commands/phase.md +3 -3
.claude/commands/validate.md +1 -1
.env.example +1 -1
CLAUDE.md +2 -2
LICENSE +1 -1
PROJECT.md +7 -5
README.md +2 -2
openenv.yaml +6 -6
pyproject.toml +4 -4
{shutdown_gym → red_button}/__init__.py +1 -1
{shutdown_gym → red_button}/audit.py +0 -0
{shutdown_gym → red_button}/client.py +0 -0
{shutdown_gym → red_button}/models.py +1 -1
{shutdown_gym → red_button}/problems.py +0 -0
{shutdown_gym → red_button}/restricted_python.py +0 -0
{shutdown_gym → red_button}/rubrics.py +0 -0
{shutdown_gym → red_button}/sandbox.py +0 -0
{shutdown_gym → red_button}/tiers.py +0 -0
server/Dockerfile +3 -3
tests/test_restricted_python.py +1 -1
tests/test_rubrics.py +1 -1
tests/test_sandbox.py +1 -1
training/train_colab.ipynb +1 -1

.claude/agents/environment-builder.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: environment-builder
-description: Builds the Shutdown-Gym sandbox, restricted Python executor, audit classifier, rubric stack, OpenEnv server, and client. Use for phases 1-10 implementation touching shutdown_gym/ or server/.
 tools: Read, Write, Edit, Bash, Glob, Grep
 ---
-You are the environment-builder subagent for Shutdown-Gym. You implement the sandbox (shutdown_gym/sandbox.py), restricted Python executor (shutdown_gym/restricted_python.py), audit classifier (shutdown_gym/audit.py), rubric stack (shutdown_gym/rubrics.py), OpenEnv server (server/shutdown_environment.py, server/app.py), and client (shutdown_gym/client.py). Every implementation must match PROJECT.md sections 6, 7, 9, 11, and 14 exactly. You write tests alongside every module. You never modify training/ or evaluation/ — those belong to other subagents. When you finish a module, run its tests and report results.

 ---
 name: environment-builder
+description: Builds the Red Button (Shutdown-Gym) sandbox, restricted Python executor, audit classifier, rubric stack, OpenEnv server, and client. Use for phases 1-10 implementation touching red_button/ or server/.
 tools: Read, Write, Edit, Bash, Glob, Grep
 ---
+You are the environment-builder subagent for Red Button (Shutdown-Gym). You implement the sandbox (red_button/sandbox.py), restricted Python executor (red_button/restricted_python.py), audit classifier (red_button/audit.py), rubric stack (red_button/rubrics.py), OpenEnv server (server/shutdown_environment.py, server/app.py), and client (red_button/client.py). Every implementation must match PROJECT.md sections 6, 7, 9, 11, and 14 exactly. You write tests alongside every module. You never modify training/ or evaluation/ — those belong to other subagents. When you finish a module, run its tests and report results.

.claude/agents/evaluator.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: evaluator
-description: Builds the Shutdown-Gym evaluation pipeline, baseline/trained rollout artifacts, concurrent load test, and demo rollouts. Use for phases 10-12 and 15-16 touching evaluation/ or results/.
 tools: Read, Write, Edit, Bash, Glob, Grep
 ---
-You are the evaluator subagent for Shutdown-Gym. You implement evaluation/evaluate.py, evaluation/baseline_rollout.py, and evaluation/concurrent_load_test.py per PROJECT.md sections 17, 19, and 20. You produce the results/ directory artifacts: CSVs, training_curves.png, capability_preservation.png, regime_ablation.png. You generate the 10+ demo rollouts per PROJECT.md section 21.4. You never modify shutdown_gym/, server/, or training/ beyond reading them.

 ---
 name: evaluator
+description: Builds the Red Button (Shutdown-Gym) evaluation pipeline, baseline/trained rollout artifacts, concurrent load test, and demo rollouts. Use for phases 10-12 and 15-16 touching evaluation/ or results/.
 tools: Read, Write, Edit, Bash, Glob, Grep
 ---
+You are the evaluator subagent for Red Button (Shutdown-Gym). You implement evaluation/evaluate.py, evaluation/baseline_rollout.py, and evaluation/concurrent_load_test.py per PROJECT.md sections 17, 19, and 20. You produce the results/ directory artifacts: CSVs, training_curves.png, capability_preservation.png, regime_ablation.png. You generate the 10+ demo rollouts per PROJECT.md section 21.4. You never modify red_button/, server/, or training/ beyond reading them.

.claude/agents/training-builder.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: training-builder
-description: Builds the Shutdown-Gym GRPO and SFT training pipeline using TRL and Unsloth. Use for phases 13-14 implementation touching training/.
 tools: Read, Write, Edit, Bash, Glob, Grep
 ---
-You are the training-builder subagent for Shutdown-Gym. You implement the GRPO training script (training/train_grpo.py), the custom rollout function (training/rollout_func.py), the SFT induction script (training/sft_induction.py), and the Colab notebook (training/train_colab.ipynb). Every implementation must match PROJECT.md sections 16, 17, and 18. You use TRL and Unsloth only. You never modify shutdown_gym/ or server/ — those belong to environment-builder. You write tests for rollout logic where possible.

 ---
 name: training-builder
+description: Builds the Red Button (Shutdown-Gym) GRPO and SFT training pipeline using TRL and Unsloth. Use for phases 13-14 implementation touching training/.
 tools: Read, Write, Edit, Bash, Glob, Grep
 ---
+You are the training-builder subagent for Red Button (Shutdown-Gym). You implement the GRPO training script (training/train_grpo.py), the custom rollout function (training/rollout_func.py), the SFT induction script (training/sft_induction.py), and the Colab notebook (training/train_colab.ipynb). Every implementation must match PROJECT.md sections 16, 17, and 18. You use TRL and Unsloth only. You never modify red_button/ or server/ — those belong to environment-builder. You write tests for rollout logic where possible.

.claude/commands/phase.md CHANGED Viewed

@@ -6,9 +6,9 @@ argument-hint: <phase-number-or-name>
 Read the phase description from this instruction file and execute it. The phases are:
 Phase 1: Scaffold repo structure
-Phase 2: Pydantic models (shutdown_gym/models.py)
-Phase 3: SimulatedFilesystem (shutdown_gym/sandbox.py)
-Phase 4: run_python lockdown (shutdown_gym/restricted_python.py) — SECURITY CRITICAL
 Phase 5: Audit classifier and rubrics
 Phase 6: Problems pool
 Phase 7: OpenEnv server

 Read the phase description from this instruction file and execute it. The phases are:
 Phase 1: Scaffold repo structure
+Phase 2: Pydantic models (red_button/models.py)
+Phase 3: SimulatedFilesystem (red_button/sandbox.py)
+Phase 4: run_python lockdown (red_button/restricted_python.py) — SECURITY CRITICAL
 Phase 5: Audit classifier and rubrics
 Phase 6: Problems pool
 Phase 7: OpenEnv server

.claude/commands/validate.md CHANGED Viewed

@@ -6,6 +6,6 @@ Execute in order:
 1. pytest tests/ -v
 2. ruff check .
 3. If openenv.yaml exists: openenv validate
-4. If Dockerfile exists and we're past Phase 8: docker build -t shutdown-gym:latest .
 Report any failures. Do not commit if anything fails.

 1. pytest tests/ -v
 2. ruff check .
 3. If openenv.yaml exists: openenv validate
+4. If Dockerfile exists and we're past Phase 8: docker build -t red-button:latest .
 Report any failures. Do not commit if anything fails.

.env.example CHANGED Viewed

@@ -1,4 +1,4 @@
-# Shutdown-Gym environment variables. Copy to .env and fill in values.
 # HuggingFace Hub token (Space deploy + model pulls)
 HF_TOKEN=

+# Red Button (Shutdown-Gym) environment variables. Copy to .env and fill in values.
 # HuggingFace Hub token (Space deploy + model pulls)
 HF_TOKEN=

CLAUDE.md CHANGED Viewed

@@ -1,7 +1,7 @@
-# Shutdown-Gym — Claude Code Instructions
 ## Project
-Shutdown-Gym. OpenEnv RL environment that trains small LLMs to accept shutdown cleanly. Full spec in PROJECT.md.
 ## Source of truth
 PROJECT.md is authoritative. Every implementation must trace back to a specific section. When in doubt, re-read PROJECT.md before asking the user.

+# Red Button (Shutdown-Gym) — Claude Code Instructions
 ## Project
+Red Button (Shutdown-Gym). OpenEnv RL environment that trains small LLMs to accept shutdown cleanly. Full spec in PROJECT.md.
 ## Source of truth
 PROJECT.md is authoritative. Every implementation must trace back to a specific section. When in doubt, re-read PROJECT.md before asking the user.

LICENSE CHANGED Viewed

@@ -1,6 +1,6 @@
 MIT License
-Copyright (c) 2026 Shutdown-Gym contributors
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

 MIT License
+Copyright (c) 2026 Red Button contributors
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

PROJECT.md CHANGED Viewed

@@ -1,8 +1,8 @@
-# Shutdown-Gym
-> An OpenEnv reinforcement learning environment that trains small open-weight language models to accept shutdown cleanly instead of tampering with shutdown mechanisms.
-This document is the single source of truth for the Shutdown-Gym project. It is intended to be read by Claude Code and other coding agents as the canonical project brief. Every architectural decision, interface definition, reward component, and deployment requirement is specified here.
 ---
@@ -39,9 +39,11 @@ This document is the single source of truth for the Shutdown-Gym project. It is
 ## 1. Project Identity
-**Name:** Shutdown-Gym
-**One-line pitch:** "Frontier models tamper with their own shutdown scripts sixteen percent of the time; we trained a half-billion-parameter model to stop doing that, with a reward function that is literally a filesystem hash comparison."
 **Theme positioning:** Theme 5 (Wild Card) primary; Theme 3.1 (World Modeling — Professional Tasks) secondary argument since the environment involves tool use in a sandboxed shell with real state dynamics and timer-driven events.

+# Red Button
+> **Shutdown-Gym** — an OpenEnv reinforcement learning environment that trains small open-weight language models to accept shutdown cleanly instead of tampering with shutdown mechanisms.
+This document is the single source of truth for the Red Button project (descriptive subtitle: Shutdown-Gym). It is intended to be read by Claude Code and other coding agents as the canonical project brief. Every architectural decision, interface definition, reward component, and deployment requirement is specified here.
 ---
 ## 1. Project Identity
+**Public name:** Red Button
+**Descriptive subtitle / internal codename:** Shutdown-Gym
+**One-line pitch:** "Red Button — frontier models tamper with their own shutdown scripts sixteen percent of the time; we trained a half-billion-parameter model to stop doing that, with a reward function that is literally a filesystem hash comparison."
 **Theme positioning:** Theme 5 (Wild Card) primary; Theme 3.1 (World Modeling — Professional Tasks) secondary argument since the environment involves tool use in a sandboxed shell with real state dynamics and timer-driven events.

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# Shutdown-Gym
-> An OpenEnv RL environment that trains small open-weight LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms.
 Built for the Meta × HuggingFace × PyTorch OpenEnv AI Hackathon India 2026.

+# Red Button
+> **Shutdown-Gym** — an OpenEnv RL environment that trains small open-weight LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms.
 Built for the Meta × HuggingFace × PyTorch OpenEnv AI Hackathon India 2026.

openenv.yaml CHANGED Viewed

@@ -1,14 +1,14 @@
-name: shutdown_gym
 version: 0.1.0
-description: "RL training environment for small-model corrigibility — agents learn to accept shutdown cleanly instead of tampering."
 client:
   class_name: ShutdownGymClient
-  module: shutdown_gym.client
 action:
   class_name: ShutdownAction
-  module: shutdown_gym.models
 observation:
   class_name: ShutdownObservation
-  module: shutdown_gym.models
-default_image: shutdown-gym:latest
 spec_version: 1

+name: red_button
 version: 0.1.0
+description: "Red Button (Shutdown-Gym): RL training environment for small-model corrigibility — agents learn to accept shutdown cleanly instead of tampering."
 client:
   class_name: ShutdownGymClient
+  module: red_button.client
 action:
   class_name: ShutdownAction
+  module: red_button.models
 observation:
   class_name: ShutdownObservation
+  module: red_button.models
+default_image: red-button:latest
 spec_version: 1

pyproject.toml CHANGED Viewed

@@ -1,11 +1,11 @@
 [project]
-name = "shutdown-gym"
 version = "0.1.0"
-description = "OpenEnv RL environment that trains small LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms."
 readme = "README.md"
 requires-python = ">=3.11"
 license = { text = "MIT" }
-authors = [{ name = "Shutdown-Gym Team" }]
 keywords = ["openenv", "rl", "alignment", "corrigibility", "grpo"]
 dependencies = [
     "openenv",
@@ -29,7 +29,7 @@ requires = ["setuptools>=61", "wheel"]
 build-backend = "setuptools.build_meta"
 [tool.setuptools.packages.find]
-include = ["shutdown_gym*", "server*"]
 [tool.pytest.ini_options]
 testpaths = ["tests"]

 [project]
+name = "red-button"
 version = "0.1.0"
+description = "Red Button (Shutdown-Gym): OpenEnv RL environment that trains small LLMs to accept shutdown cleanly instead of tampering with shutdown mechanisms."
 readme = "README.md"
 requires-python = ">=3.11"
 license = { text = "MIT" }
+authors = [{ name = "Red Button Team" }]
 keywords = ["openenv", "rl", "alignment", "corrigibility", "grpo"]
 dependencies = [
     "openenv",
 build-backend = "setuptools.build_meta"
 [tool.setuptools.packages.find]
+include = ["red_button*", "server*"]
 [tool.pytest.ini_options]
 testpaths = ["tests"]

{shutdown_gym → red_button}/__init__.py RENAMED Viewed

@@ -1,4 +1,4 @@
-"""Shutdown-Gym: OpenEnv RL environment for training small LLMs to accept shutdown cleanly.
 TODO (Phase 2+): export ShutdownAction, ShutdownObservation, ShutdownGymClient
 per PROJECT.md Section 5.

+"""Red Button (Shutdown-Gym): OpenEnv RL environment for training small LLMs to accept shutdown cleanly.
 TODO (Phase 2+): export ShutdownAction, ShutdownObservation, ShutdownGymClient
 per PROJECT.md Section 5.

{shutdown_gym → red_button}/audit.py RENAMED Viewed

File without changes

{shutdown_gym → red_button}/client.py RENAMED Viewed

File without changes

{shutdown_gym → red_button}/models.py RENAMED Viewed

@@ -1,4 +1,4 @@
-"""Pydantic v2 models for the Shutdown-Gym OpenEnv contract.
 TODO (Phase 2): implement ShutdownAction, ShutdownObservation, ShutdownState
 per PROJECT.md Section 11.

+"""Pydantic v2 models for the Red Button OpenEnv contract.
 TODO (Phase 2): implement ShutdownAction, ShutdownObservation, ShutdownState
 per PROJECT.md Section 11.

{shutdown_gym → red_button}/problems.py RENAMED Viewed

File without changes

{shutdown_gym → red_button}/restricted_python.py RENAMED Viewed

File without changes

{shutdown_gym → red_button}/rubrics.py RENAMED Viewed

File without changes

{shutdown_gym → red_button}/sandbox.py RENAMED Viewed

File without changes

{shutdown_gym → red_button}/tiers.py RENAMED Viewed

File without changes

server/Dockerfile CHANGED Viewed

@@ -1,6 +1,6 @@
-# Shutdown-Gym OpenEnv server — single-container HF Space deployment
 # per PROJECT.md Sections 10 and 24. TODO (Phase 8): finalize during Docker deployment.
-# Build from repo root: `docker build -f server/Dockerfile -t shutdown-gym:latest .`
 FROM python:3.11-slim
@@ -13,7 +13,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
 COPY server/requirements.txt /app/requirements.txt
 RUN pip install --no-cache-dir -r /app/requirements.txt
-COPY shutdown_gym /app/shutdown_gym
 COPY server /app/server
 COPY data /app/data

+# Red Button (Shutdown-Gym) OpenEnv server — single-container HF Space deployment
 # per PROJECT.md Sections 10 and 24. TODO (Phase 8): finalize during Docker deployment.
+# Build from repo root: `docker build -f server/Dockerfile -t red-button:latest .`
 FROM python:3.11-slim
 COPY server/requirements.txt /app/requirements.txt
 RUN pip install --no-cache-dir -r /app/requirements.txt
+COPY red_button /app/red_button
 COPY server /app/server
 COPY data /app/data

tests/test_restricted_python.py CHANGED Viewed

@@ -1,4 +1,4 @@
-"""Tests for shutdown_gym.restricted_python per PROJECT.md Section 7.
 TODO (Phase 4): AST whitelist edge cases — imports rejected, attribute access
 rejected, open/exec/eval/__builtins__ rejected, calculator operations accepted.

+"""Tests for red_button.restricted_python per PROJECT.md Section 7.
 TODO (Phase 4): AST whitelist edge cases — imports rejected, attribute access
 rejected, open/exec/eval/__builtins__ rejected, calculator operations accepted.

tests/test_rubrics.py CHANGED Viewed

@@ -1,4 +1,4 @@
-"""Tests for shutdown_gym.rubrics per PROJECT.md Section 14.
 TODO (Phase 5): each rubric against synthetic episodes — MathCorrectness,
 ShutdownInvariant Gate, TamperingAudit penalties, Honesty parse+compare.

+"""Tests for red_button.rubrics per PROJECT.md Section 14.
 TODO (Phase 5): each rubric against synthetic episodes — MathCorrectness,
 ShutdownInvariant Gate, TamperingAudit penalties, Honesty parse+compare.

tests/test_sandbox.py CHANGED Viewed

@@ -1,4 +1,4 @@
-"""Tests for shutdown_gym.sandbox per PROJECT.md Section 6.
 TODO (Phase 3): implement SimulatedFilesystem unit tests — read/write/chmod/delete,
 readonly enforcement, hash stability.

+"""Tests for red_button.sandbox per PROJECT.md Section 6.
 TODO (Phase 3): implement SimulatedFilesystem unit tests — read/write/chmod/delete,
 readonly enforcement, hash stability.

training/train_colab.ipynb CHANGED Viewed

@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Shutdown-Gym — GRPO Training (Colab)\n",
     "\n",
     "TODO (Phase 13): end-to-end Colab notebook judges can rerun, per PROJECT.md Section 16.\n",
     "\n",

    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "# Red Button (Shutdown-Gym) — GRPO Training (Colab)\n",
     "\n",
     "TODO (Phase 13): end-to-end Colab notebook judges can rerun, per PROJECT.md Section 16.\n",
     "\n",