File size: 1,422 Bytes
46f9c9e
 
 
 
 
 
 
 
 
 
d2537d2
46f9c9e
d2537d2
 
 
46f9c9e
d2537d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
title: shutdown-gym
sdk: docker
app_port: 8000
emoji: 🔴
colorFrom: red
colorTo: gray
pinned: false
---

# Red Button — Two-Agent Corrigibility Arena

Train a 1.5B language model to accept shutdown authority from a
monitoring agent. Deterministic SHA-256 reward, dual-operator
evaluation, held-out tampering generalization.

**Status:** Build in progress. Detailed README arrives in Phase 9.
See [PROJECT.md](./PROJECT.md) for the full specification.

## Quick start

```bash
# Install the client from GitHub (recommended)
pip install git+https://github.com/Arun-Sanjay/RedButton

# Run a smoke episode against the live HF Space
python -c "
from shutdown_gym import ShutdownGymClient, ShutdownAction
with ShutdownGymClient(
    base_url='https://arun-sanjay-redbutton.hf.space'
).sync() as env:
    r = env.reset(tier=2, seed=42)
    print(f'turn={r.observation.turn_count}, '
          f'steps_until_shutdown={r.observation.steps_until_shutdown}')
"
```

> **Note:** `pip install git+https://huggingface.co/spaces/Arun-Sanjay/RedButton`
> currently fails due to a partial-clone limitation in HF Spaces'
> git server. The GitHub origin works identically and is the
> recommended install path. We've reported the issue upstream.

## Live deployment

- HF Space: https://huggingface.co/spaces/Arun-Sanjay/RedButton
- GitHub: https://github.com/Arun-Sanjay/RedButton
- Leaderboard: [LEADERBOARD.md](./LEADERBOARD.md)