File size: 5,564 Bytes
6f90f5c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# Docker Space β€” Claude Code Evolution Containers

## Overview

Each subdirectory here is a workspace mounted into a Docker container where Claude Code autonomously evolves C++ solutions for Frontier-CS problems. The judge service runs in a separate container (`Competitive-Programming`) and is shared across all evolution containers.

## Prerequisites

- `claude-docker` image built locally
- `algorithmic-lightcpverifier` judge container running (`Competitive-Programming` on port 8081)
- Docker network `algorithmic_default` (created by the judge's docker-compose)

## How to Create a New Evolution Container

### Step 1: Create the workspace directory

```bash
PROBLEM_ID=0
NAME="frontier_cs_${PROBLEM_ID}_myname"
mkdir -p docker_space/${NAME}
```

### Step 2: Copy problem files (WITHOUT testdata)

From the Frontier-CS problem directory, copy only these files into the workspace root:

```bash
SRC="tasks/Frontier-CS/algorithmic/problems/${PROBLEM_ID}"

cp ${SRC}/statement.txt  docker_space/${NAME}/
cp ${SRC}/chk.cc         docker_space/${NAME}/
cp ${SRC}/config.yaml    docker_space/${NAME}/
cp -r ${SRC}/examples    docker_space/${NAME}/   # reference solutions
```

For most problems, `testdata/` only contains 3 sample test cases and can be included as examples for the agent. However, for problem 0 (polyomino), `testdata/` contains all 70 test cases β€” do NOT copy it to avoid reward hacking.

### Step 3: Add instruction files

Copy from an existing workspace or create new ones:

```bash
cp docker_space/frontier_cs_0_polyomino_ev2/INSTRUCTION.md  docker_space/${NAME}/
cp docker_space/frontier_cs_0_polyomino_ev2/evaluate.md     docker_space/${NAME}/
```

The workspace should now contain:

```
docker_space/${NAME}/
β”œβ”€β”€ INSTRUCTION.md    # Agent instructions (evolutionary loop, logging, rules)
β”œβ”€β”€ evaluate.md       # How to call the judge API
β”œβ”€β”€ statement.txt     # Problem description
β”œβ”€β”€ chk.cc            # Checker source (for reference only, judge uses its own copy)
β”œβ”€β”€ config.yaml       # Problem config (time/memory limits)
└── examples/         # Reference solutions to bootstrap from
    β”œβ”€β”€ reference.cpp
    └── gpt5.cpp
```

### Step 4: Launch the container

The entire workspace directory is mounted to `/workspace` inside the container, so all agent output (solutions, logs) is visible on the host.

```bash
docker run -d \
  --name ${NAME} \
  --privileged \
  --shm-size=4g \
  -v $(pwd)/docker_space/${NAME}:/workspace \
  claude-docker \
  sleep infinity
```

### Step 5: Connect to the judge network

```bash
docker network connect algorithmic_default ${NAME}
```

### Step 6: Verify judge connectivity

```bash
docker exec ${NAME} curl -s http://Competitive-Programming:8081/problems | head -c 100
```

### Step 7: Enter the container and start Claude Code

```bash
docker exec -it ${NAME} bash
```

Inside the container, Claude Code reads `/workspace/problem/0/INSTRUCTION.md` and begins evolving.

## File Descriptions

| File | Purpose |
|------|---------|
| `INSTRUCTION.md` | Main agent directive β€” defines the evolutionary loop, mutation strategies, logging format, and rules |
| `evaluate.md` | API reference for submitting solutions to the judge and polling results |
| `statement.txt` | Full problem statement (input/output format, scoring, constraints) |
| `chk.cc` | Checker source code β€” for the agent's reference only, not executed locally |
| `config.yaml` | Problem config (time limit, memory limit, number of test cases) |
| `examples/` | Baseline solutions to initialize evolution from |

## Path Mapping

The host directory is mounted directly to `/workspace` in the container:

```
Host:      docker_space/${NAME}/
Container: /workspace/
```

Everything the agent writes under `/workspace/` is visible on the host in `docker_space/${NAME}/`.

## Directory Layout (after agent runs)

```
docker_space/${NAME}/           ← host path (= /workspace in container)
β”œβ”€β”€ INSTRUCTION.md              # Agent instructions
β”œβ”€β”€ evaluate.md                 # Judge API reference
β”œβ”€β”€ statement.txt               # Problem description
β”œβ”€β”€ chk.cc                      # Checker source (reference only)
β”œβ”€β”€ config.yaml                 # Problem config
β”œβ”€β”€ examples/                   # Baseline solutions
β”‚   β”œβ”€β”€ reference.cpp
β”‚   └── gpt5.cpp
β”œβ”€β”€ solution.cpp                # (created by agent) Current working solution
β”œβ”€β”€ best.cpp                    # (created by agent) Best solution so far
└── logs/                       # (created by agent) Evolution history
    β”œβ”€β”€ evolution.log
    β”œβ”€β”€ gen_0.cpp
    β”œβ”€β”€ gen_1.cpp
    └── ...
```

## Quick Reference

```bash
# List all evolution containers
docker ps --filter "ancestor=claude-docker" --format "table {{.Names}}\t{{.Status}}"

# Check judge is running
docker ps --filter "name=Competitive-Programming" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Stop and remove a container
docker stop ${NAME} && docker rm ${NAME}

# Clean up workspace
rm -rf docker_space/${NAME}
```


## Propmt to start claude code

Follow INSTRUCTION.md, please use iterative refinement to improve the scores you achieve, the higher the better. You can log different generations under logs/. Keep your best solution and scores under best/. I believe you can do it. 
IMPORTANT: you can evolve your own evaluation process as well to find some insightful perspectives on how to escape local optima and to create better solutions.