JustinTX's picture
Add files using upload-large-folder tool
6f90f5c verified

Docker Space β€” Claude Code Evolution Containers

Overview

Each subdirectory here is a workspace mounted into a Docker container where Claude Code autonomously evolves C++ solutions for Frontier-CS problems. The judge service runs in a separate container (Competitive-Programming) and is shared across all evolution containers.

Prerequisites

  • claude-docker image built locally
  • algorithmic-lightcpverifier judge container running (Competitive-Programming on port 8081)
  • Docker network algorithmic_default (created by the judge's docker-compose)

How to Create a New Evolution Container

Step 1: Create the workspace directory

PROBLEM_ID=0
NAME="frontier_cs_${PROBLEM_ID}_myname"
mkdir -p docker_space/${NAME}

Step 2: Copy problem files (WITHOUT testdata)

From the Frontier-CS problem directory, copy only these files into the workspace root:

SRC="tasks/Frontier-CS/algorithmic/problems/${PROBLEM_ID}"

cp ${SRC}/statement.txt  docker_space/${NAME}/
cp ${SRC}/chk.cc         docker_space/${NAME}/
cp ${SRC}/config.yaml    docker_space/${NAME}/
cp -r ${SRC}/examples    docker_space/${NAME}/   # reference solutions

For most problems, testdata/ only contains 3 sample test cases and can be included as examples for the agent. However, for problem 0 (polyomino), testdata/ contains all 70 test cases β€” do NOT copy it to avoid reward hacking.

Step 3: Add instruction files

Copy from an existing workspace or create new ones:

cp docker_space/frontier_cs_0_polyomino_ev2/INSTRUCTION.md  docker_space/${NAME}/
cp docker_space/frontier_cs_0_polyomino_ev2/evaluate.md     docker_space/${NAME}/

The workspace should now contain:

docker_space/${NAME}/
β”œβ”€β”€ INSTRUCTION.md    # Agent instructions (evolutionary loop, logging, rules)
β”œβ”€β”€ evaluate.md       # How to call the judge API
β”œβ”€β”€ statement.txt     # Problem description
β”œβ”€β”€ chk.cc            # Checker source (for reference only, judge uses its own copy)
β”œβ”€β”€ config.yaml       # Problem config (time/memory limits)
└── examples/         # Reference solutions to bootstrap from
    β”œβ”€β”€ reference.cpp
    └── gpt5.cpp

Step 4: Launch the container

The entire workspace directory is mounted to /workspace inside the container, so all agent output (solutions, logs) is visible on the host.

docker run -d \
  --name ${NAME} \
  --privileged \
  --shm-size=4g \
  -v $(pwd)/docker_space/${NAME}:/workspace \
  claude-docker \
  sleep infinity

Step 5: Connect to the judge network

docker network connect algorithmic_default ${NAME}

Step 6: Verify judge connectivity

docker exec ${NAME} curl -s http://Competitive-Programming:8081/problems | head -c 100

Step 7: Enter the container and start Claude Code

docker exec -it ${NAME} bash

Inside the container, Claude Code reads /workspace/problem/0/INSTRUCTION.md and begins evolving.

File Descriptions

File Purpose
INSTRUCTION.md Main agent directive β€” defines the evolutionary loop, mutation strategies, logging format, and rules
evaluate.md API reference for submitting solutions to the judge and polling results
statement.txt Full problem statement (input/output format, scoring, constraints)
chk.cc Checker source code β€” for the agent's reference only, not executed locally
config.yaml Problem config (time limit, memory limit, number of test cases)
examples/ Baseline solutions to initialize evolution from

Path Mapping

The host directory is mounted directly to /workspace in the container:

Host:      docker_space/${NAME}/
Container: /workspace/

Everything the agent writes under /workspace/ is visible on the host in docker_space/${NAME}/.

Directory Layout (after agent runs)

docker_space/${NAME}/           ← host path (= /workspace in container)
β”œβ”€β”€ INSTRUCTION.md              # Agent instructions
β”œβ”€β”€ evaluate.md                 # Judge API reference
β”œβ”€β”€ statement.txt               # Problem description
β”œβ”€β”€ chk.cc                      # Checker source (reference only)
β”œβ”€β”€ config.yaml                 # Problem config
β”œβ”€β”€ examples/                   # Baseline solutions
β”‚   β”œβ”€β”€ reference.cpp
β”‚   └── gpt5.cpp
β”œβ”€β”€ solution.cpp                # (created by agent) Current working solution
β”œβ”€β”€ best.cpp                    # (created by agent) Best solution so far
└── logs/                       # (created by agent) Evolution history
    β”œβ”€β”€ evolution.log
    β”œβ”€β”€ gen_0.cpp
    β”œβ”€β”€ gen_1.cpp
    └── ...

Quick Reference

# List all evolution containers
docker ps --filter "ancestor=claude-docker" --format "table {{.Names}}\t{{.Status}}"

# Check judge is running
docker ps --filter "name=Competitive-Programming" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Stop and remove a container
docker stop ${NAME} && docker rm ${NAME}

# Clean up workspace
rm -rf docker_space/${NAME}

Propmt to start claude code

Follow INSTRUCTION.md, please use iterative refinement to improve the scores you achieve, the higher the better. You can log different generations under logs/. Keep your best solution and scores under best/. I believe you can do it. IMPORTANT: you can evolve your own evaluation process as well to find some insightful perspectives on how to escape local optima and to create better solutions.