--- title: Rust Coder OpenEnv emoji: πŸ¦€ colorFrom: red colorTo: yellow sdk: docker app_port: 8000 base_path: /web pinned: false tags: - openenv - software-engineering - rust --- # Rust Coder: Systems Engineering Environment Rust Coder is a high-fidelity **OpenEnv** environment designed to evaluate and train LLM agents on real-world Rust systems programming tasks. Unlike toy environments, Rust Coder simulates valid engineering scenarios involving the borrow checker, concurrency, and memory safety. ## Motivation Rust is uniquely challenging for AI agents due to its strict compile-time safety guarantees. This environment provides a 10-task progression that measures an agent's ability to: 1. Fix borrow checker violations 2. Correctly annotate lifetimes 3. Resolve concurrency deadlocks 4. Write unsafe FFI code correctly 5. Identify and prevent memory leaks 6. Optimize data pipelines for performance --- ## Action Space **Type**: `RustCoderAction` The agent submits a single string containing the complete, fixed Rust source code. | Field | Type | Description | |-------|--------|------------------------------------------| | `code` | string | Full Rust source code to compile and test | ## Observation Space **Type**: `RustCoderObservation` The environment returns detailed feedback after each submission: | Field | Type | Description | |------------------------|-------------|-----------------------------------------------------| | `problem_description` | string | Task requirements and context | | `header_section` | string | LeetCode-style scaffold (imports + signatures/types) | | `compilation_success` | bool | Whether `rustc` compiled the submitted code | | `compilation_output` | string | Raw compiler errors and warnings | | `test_results` | list[dict] | Per-test pass/fail results with error details | | `reward_breakdown` | dict | Weighted score breakdown across 5 dimensions | --- ## Reward Function Total reward is a weighted sum of 5 dimensions, each normalized to [0, 1]: | Dimension | Weight | Metric | |-----------------|--------|---------------------------------------------------| | Compilation | 40% | Binary success/failure of `rustc` | | Correctness | 20% | Fraction of test assertions that pass | | Coverage | 20% | Fraction of tests that successfully ran | | Elegance | 10% | Code quality heuristics (avoids `.unwrap()`, long lines, `unsafe`) | | Efficiency | 10% | Execution time vs. per-problem baseline | Reward provides partial signal at every step β€” compilation alone earns 0.40, passing all tests earns up to 1.0. --- ## Tasks 10 sequential problems with increasing difficulty: | ID | Title | Difficulty | Skill Evaluated | |----|------------------------------------|------------|-------------------------------| | 1 | Broken CLI Argument Parser | Easy | Enums & pattern matching | | 2 | Conflicting Borrows | Easyβ†’Med | Borrow checker | | 3 | Invalid Lifetime Annotations | Medium | Lifetime annotations | | 4 | Business Logic Errors | Medium | Math & correctness | | 5 | Linked List Management | Medium | Ownership & data structures | | 6 | Multi-threaded Deadlocks | Hard | Mutex & concurrency | | 7 | Async Borrowing Conflicts | Hard | Async/await lifetimes | | 8 | Unsafe FFI Integration | Hard | `unsafe` & C interop | | 9 | Inefficient Data Pipeline | Hard | Performance optimization | | 10 | Memory Leak Prevention | Hard+ | Weak pointers & ownership | --- ## Environment Variables / Secrets The environment reads the following variables. Set them as **HF Space secrets** (Settings β†’ Variables and Secrets) when deploying to Hugging Face, or in a local `.env` file for development. | Variable | Required | Default | Description | |----------------|----------|--------------------------------------|--------------------------------------| | `HF_TOKEN` | Yes | β€” | Hugging Face API token for LLM calls | | `API_BASE_URL` | No | `https://router.huggingface.co/v1` | Inference endpoint | | `MODEL_NAME` | No | `Qwen/Qwen2.5-72B-Instruct` | Model to use for evaluation | > **Note**: The `.env` file is excluded from Docker images by `.dockerignore`. On HF Spaces, secrets are injected as OS environment variables by the platform β€” `load_dotenv()` silently does nothing if no file is present, and `os.getenv()` reads from the platform-injected vars. This is the correct behavior. --- ## Setup & Usage ### Local Development ```bash # 1. Clone and enter the repo git clone https://github.com/your-username/rust_coder cd rust_coder # 2. Create .env with your credentials cat > .env << EOF HF_TOKEN=hf_your_token_here API_BASE_URL=https://router.huggingface.co/v1 MODEL_NAME=Qwen/Qwen2.5-72B-Instruct EOF # 3. Build the Docker image (uses root Dockerfile) docker build -t rust_coder:latest . # 4. Run the environment server docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest # 5. Verify it's healthy curl http://localhost:8000/health # β†’ {"status": "healthy"} # 6. Run the inference benchmark python inference.py ``` ### Docker Commands Reference ```bash # Build docker build -t rust_coder:latest . # Run with .env file docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest # View logs docker logs rust_env # Stop docker stop rust_env ``` ### Environment API ```bash # Reset (returns first problem) curl -X POST http://localhost:8000/reset # Step (submit Rust code) curl -X POST http://localhost:8000/step \ -H "Content-Type: application/json" \ -d '{"action": {"code": "fn main() { println!(\"hello\"); }"}}' # Health check curl http://localhost:8000/health ``` ### HF Spaces Deployment ```bash # Install HF CLI pip install huggingface_hub # Login huggingface-cli login # Push to Space openenv push --repo-id your-username/rust-coder ``` Then go to your Space settings and add secrets: - `HF_TOKEN` β†’ your Hugging Face API token - `MODEL_NAME` β†’ e.g. `Qwen/Qwen2.5-72B-Instruct` --- ## Baseline Scores Baseline using **Qwen/Qwen2.5-72B-Instruct** via Hugging Face router: | Metric | Score | |----------------|-------| | Average reward | 0.59 | | Compilation % | ~85% | | Correctness % | ~45% | --- ## Project Structure ``` rust_coder/ β”œβ”€β”€ Dockerfile # Root Dockerfile (used by validator + HF Spaces) β”œβ”€β”€ server/Dockerfile # Identical copy (used for -f flag builds) β”œβ”€β”€ openenv.yaml # OpenEnv spec metadata β”œβ”€β”€ pyproject.toml # Python package config β”œβ”€β”€ uv.lock # Locked dependencies β”œβ”€β”€ problems.json # 10 coding problems dataset β”œβ”€β”€ models.py # Pydantic action/observation types β”œβ”€β”€ client.py # WebSocket client for RustCoderEnv β”œβ”€β”€ inference.py # Baseline inference script (entry point) β”œβ”€β”€ __init__.py # Package exports └── server/ β”œβ”€β”€ app.py # FastAPI OpenEnv server entrypoint └── rust_coder_environment.py # Core environment logic ``` ## HF Space runtime model - The Hugging Face Space serves the environment via `uvicorn server.app:app` (see `openenv.yaml` and `Dockerfile`). - The built-in OpenEnv web UI may send an empty action on Step; this environment supports that by auto-calling the LLM when `action.code` is empty (unless disabled via `AUTO_LLM_ON_EMPTY_STEP=0`). - `inference.py` is the required baseline runner used by the validator/judge. It connects to the running Space and drives `reset()`/`step()` in a loop, emitting strict `[START]`/`[STEP]`/`[END]` stdout lines.