| # Docker Space β Claude Code Evolution Containers |
|
|
| ## Overview |
|
|
| Each subdirectory here is a workspace mounted into a Docker container where Claude Code autonomously evolves C++ solutions for Frontier-CS problems. The judge service runs in a separate container (`Competitive-Programming`) and is shared across all evolution containers. |
|
|
| ## Prerequisites |
|
|
| - `claude-docker` image built locally |
| - `algorithmic-lightcpverifier` judge container running (`Competitive-Programming` on port 8081) |
| - Docker network `algorithmic_default` (created by the judge's docker-compose) |
|
|
| ## How to Create a New Evolution Container |
|
|
| ### Step 1: Create the workspace directory |
|
|
| ```bash |
| PROBLEM_ID=0 |
| NAME="frontier_cs_${PROBLEM_ID}_myname" |
| mkdir -p docker_space/${NAME} |
| ``` |
|
|
| ### Step 2: Copy problem files (WITHOUT testdata) |
|
|
| From the Frontier-CS problem directory, copy only these files into the workspace root: |
|
|
| ```bash |
| SRC="tasks/Frontier-CS/algorithmic/problems/${PROBLEM_ID}" |
| |
| cp ${SRC}/statement.txt docker_space/${NAME}/ |
| cp ${SRC}/chk.cc docker_space/${NAME}/ |
| cp ${SRC}/config.yaml docker_space/${NAME}/ |
| cp -r ${SRC}/examples docker_space/${NAME}/ # reference solutions |
| ``` |
|
|
| For most problems, `testdata/` only contains 3 sample test cases and can be included as examples for the agent. However, for problem 0 (polyomino), `testdata/` contains all 70 test cases β do NOT copy it to avoid reward hacking. |
|
|
| ### Step 3: Add instruction files |
|
|
| Copy from an existing workspace or create new ones: |
|
|
| ```bash |
| cp docker_space/frontier_cs_0_polyomino_ev2/INSTRUCTION.md docker_space/${NAME}/ |
| cp docker_space/frontier_cs_0_polyomino_ev2/evaluate.md docker_space/${NAME}/ |
| ``` |
|
|
| The workspace should now contain: |
|
|
| ``` |
| docker_space/${NAME}/ |
| βββ INSTRUCTION.md # Agent instructions (evolutionary loop, logging, rules) |
| βββ evaluate.md # How to call the judge API |
| βββ statement.txt # Problem description |
| βββ chk.cc # Checker source (for reference only, judge uses its own copy) |
| βββ config.yaml # Problem config (time/memory limits) |
| βββ examples/ # Reference solutions to bootstrap from |
| βββ reference.cpp |
| βββ gpt5.cpp |
| ``` |
|
|
| ### Step 4: Launch the container |
|
|
| The entire workspace directory is mounted to `/workspace` inside the container, so all agent output (solutions, logs) is visible on the host. |
|
|
| ```bash |
| docker run -d \ |
| --name ${NAME} \ |
| --privileged \ |
| --shm-size=4g \ |
| -v $(pwd)/docker_space/${NAME}:/workspace \ |
| claude-docker \ |
| sleep infinity |
| ``` |
|
|
| ### Step 5: Connect to the judge network |
|
|
| ```bash |
| docker network connect algorithmic_default ${NAME} |
| ``` |
|
|
| ### Step 6: Verify judge connectivity |
|
|
| ```bash |
| docker exec ${NAME} curl -s http://Competitive-Programming:8081/problems | head -c 100 |
| ``` |
|
|
| ### Step 7: Enter the container and start Claude Code |
|
|
| ```bash |
| docker exec -it ${NAME} bash |
| ``` |
|
|
| Inside the container, Claude Code reads `/workspace/problem/0/INSTRUCTION.md` and begins evolving. |
|
|
| ## File Descriptions |
|
|
| | File | Purpose | |
| |------|---------| |
| | `INSTRUCTION.md` | Main agent directive β defines the evolutionary loop, mutation strategies, logging format, and rules | |
| | `evaluate.md` | API reference for submitting solutions to the judge and polling results | |
| | `statement.txt` | Full problem statement (input/output format, scoring, constraints) | |
| | `chk.cc` | Checker source code β for the agent's reference only, not executed locally | |
| | `config.yaml` | Problem config (time limit, memory limit, number of test cases) | |
| | `examples/` | Baseline solutions to initialize evolution from | |
|
|
| ## Path Mapping |
|
|
| The host directory is mounted directly to `/workspace` in the container: |
|
|
| ``` |
| Host: docker_space/${NAME}/ |
| Container: /workspace/ |
| ``` |
|
|
| Everything the agent writes under `/workspace/` is visible on the host in `docker_space/${NAME}/`. |
|
|
| ## Directory Layout (after agent runs) |
|
|
| ``` |
| docker_space/${NAME}/ β host path (= /workspace in container) |
| βββ INSTRUCTION.md # Agent instructions |
| βββ evaluate.md # Judge API reference |
| βββ statement.txt # Problem description |
| βββ chk.cc # Checker source (reference only) |
| βββ config.yaml # Problem config |
| βββ examples/ # Baseline solutions |
| β βββ reference.cpp |
| β βββ gpt5.cpp |
| βββ solution.cpp # (created by agent) Current working solution |
| βββ best.cpp # (created by agent) Best solution so far |
| βββ logs/ # (created by agent) Evolution history |
| βββ evolution.log |
| βββ gen_0.cpp |
| βββ gen_1.cpp |
| βββ ... |
| ``` |
|
|
| ## Quick Reference |
|
|
| ```bash |
| # List all evolution containers |
| docker ps --filter "ancestor=claude-docker" --format "table {{.Names}}\t{{.Status}}" |
| |
| # Check judge is running |
| docker ps --filter "name=Competitive-Programming" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" |
| |
| # Stop and remove a container |
| docker stop ${NAME} && docker rm ${NAME} |
| |
| # Clean up workspace |
| rm -rf docker_space/${NAME} |
| ``` |
|
|
|
|
| ## Propmt to start claude code |
|
|
| Follow INSTRUCTION.md, please use iterative refinement to improve the scores you achieve, the higher the better. You can log different generations under logs/. Keep your best solution and scores under best/. I believe you can do it. |
| IMPORTANT: you can evolve your own evaluation process as well to find some insightful perspectives on how to escape local optima and to create better solutions. |