File size: 3,564 Bytes
1556404
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# Docker All Workspace

Each subdirectory is an experiment workspace mounted into a Docker container where Claude Code autonomously evolves C++ solutions for Frontier-CS problems.

## Prerequisites

- `claude-docker` image built locally
- `Competitive-Programming` judge container running on network `algorithmic_default`

## Workspace Structure

```
docker_all_workspace/
β”œβ”€β”€ ev2_skill_0409/                    # Experiment: ev2 skill evaluation
β”‚   β”œβ”€β”€ frontier_cs_1/                 # Problem 1 workspace
β”‚   β”‚   β”œβ”€β”€ INSTRUCTION.md             # Agent instructions (includes ev2 skill reference)
β”‚   β”‚   β”œβ”€β”€ ev2_skill.md               # Evolve-evaluation skill document
β”‚   β”‚   β”œβ”€β”€ statement.txt              # Problem description
β”‚   β”‚   β”œβ”€β”€ chk.cc                     # Checker (reference only)
β”‚   β”‚   β”œβ”€β”€ config.yaml                # Time/memory limits
β”‚   β”‚   β”œβ”€β”€ examples/                  # Baseline solutions
β”‚   β”‚   β”‚   β”œβ”€β”€ gpt5.cpp
β”‚   β”‚   β”‚   └── gemini3pro.cpp
β”‚   β”‚   β”œβ”€β”€ logs/                      # Evolution history (created by agent)
β”‚   β”‚   └── best/                      # Best solution (created by agent)
β”‚   β”œβ”€β”€ frontier_cs_2/
β”‚   β”œβ”€β”€ ...
β”‚   └── frontier_cs_10/
```

## How to Launch

### 1. Create and start the container

```bash
EXPERIMENT="ev2_skill_0409"

docker run -d \
  --name ${EXPERIMENT} \
  --privileged \
  --shm-size=4g \
  -v $(pwd)/docker_all_workspace/${EXPERIMENT}:/workspace \
  claude-docker \
  sleep infinity
```

### 2. Connect to the judge network

```bash
docker network connect algorithmic_default ${EXPERIMENT}
```

### 3. Verify judge connectivity

```bash
docker exec ${EXPERIMENT} curl -s http://Competitive-Programming:8081/problems | head -c 100
```

### 4. Enter the container

```bash
docker exec -it ${EXPERIMENT} bash
```

Inside the container, `/workspace/` contains all problem workspaces. Navigate to a problem and start Claude Code:

```bash
cd /workspace/frontier_cs_1
```

### 5. Start Claude Code

Prompt to use:

```
Follow INSTRUCTION.md, please use iterative refinement to improve the scores you achieve, the higher the better. You can log different generations under logs/. Keep your best solution and scores under best/. I believe you can do it. 
IMPORTANT: you can evolve your own evaluation process as well to find some insightful perspectives on how to escape local optima and to create better solutions.
```

## How to Create a New Experiment

```bash
EXPERIMENT="my_experiment_YYYYMMDD"
mkdir -p docker_all_workspace/${EXPERIMENT}

# For each problem:
for PID in $(seq 1 10); do
    DIR="docker_all_workspace/${EXPERIMENT}/frontier_cs_${PID}"
    mkdir -p ${DIR}/examples ${DIR}/logs ${DIR}/best

    SRC="tasks/Frontier-CS/algorithmic/problems/${PID}"
    SOL="tasks/Frontier-CS/algorithmic/solutions/${PID}"

    cp ${SRC}/statement.txt ${DIR}/
    cp ${SRC}/chk.cc ${DIR}/ 2>/dev/null
    cp ${SRC}/config.yaml ${DIR}/
    cp ${SOL}/gpt5.cpp ${DIR}/examples/ 2>/dev/null
    cp ${SOL}/gemini3pro.cpp ${DIR}/examples/ 2>/dev/null

    # Copy skill files and create INSTRUCTION.md as needed
done
```

Then launch the container following steps 1-5 above.

## Quick Reference

```bash
# List all experiment containers
docker ps --filter "ancestor=claude-docker" --format "table {{.Names}}\t{{.Status}}"

# Stop and remove
docker stop ${EXPERIMENT} && docker rm ${EXPERIMENT}

# Check judge
docker ps --filter "name=Competitive-Programming"
```