Spaces:
Running
Running
File size: 8,458 Bytes
3955490 9763ffa 3955490 9763ffa 3955490 9763ffa 3955490 9763ffa e96c0d4 9763ffa e96c0d4 9763ffa e96c0d4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 | ---
title: Rust Coder OpenEnv
emoji: π¦
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8000
base_path: /web
pinned: false
tags:
- openenv
- software-engineering
- rust
---
# Rust Coder: Systems Engineering Environment
Rust Coder is a high-fidelity **OpenEnv** environment designed to evaluate and train LLM agents on real-world Rust systems programming tasks. Unlike toy environments, Rust Coder simulates valid engineering scenarios involving the borrow checker, concurrency, and memory safety.
## Motivation
Rust is uniquely challenging for AI agents due to its strict compile-time safety guarantees. This environment provides a 10-task progression that measures an agent's ability to:
1. Fix borrow checker violations
2. Correctly annotate lifetimes
3. Resolve concurrency deadlocks
4. Write unsafe FFI code correctly
5. Identify and prevent memory leaks
6. Optimize data pipelines for performance
---
## Action Space
**Type**: `RustCoderAction`
The agent submits a single string containing the complete, fixed Rust source code.
| Field | Type | Description |
|-------|--------|------------------------------------------|
| `code` | string | Full Rust source code to compile and test |
## Observation Space
**Type**: `RustCoderObservation`
The environment returns detailed feedback after each submission:
| Field | Type | Description |
|------------------------|-------------|-----------------------------------------------------|
| `problem_description` | string | Task requirements and context |
| `header_section` | string | LeetCode-style scaffold (imports + signatures/types) |
| `compilation_success` | bool | Whether `rustc` compiled the submitted code |
| `compilation_output` | string | Raw compiler errors and warnings |
| `test_results` | list[dict] | Per-test pass/fail results with error details |
| `reward_breakdown` | dict | Weighted score breakdown across 5 dimensions |
---
## Reward Function
Total reward is a weighted sum of 5 dimensions, each normalized to [0, 1]:
| Dimension | Weight | Metric |
|-----------------|--------|---------------------------------------------------|
| Compilation | 40% | Binary success/failure of `rustc` |
| Correctness | 20% | Fraction of test assertions that pass |
| Coverage | 20% | Fraction of tests that successfully ran |
| Elegance | 10% | Code quality heuristics (avoids `.unwrap()`, long lines, `unsafe`) |
| Efficiency | 10% | Execution time vs. per-problem baseline |
Reward provides partial signal at every step β compilation alone earns 0.40, passing all tests earns up to 1.0.
---
## Tasks
10 sequential problems with increasing difficulty:
| ID | Title | Difficulty | Skill Evaluated |
|----|------------------------------------|------------|-------------------------------|
| 1 | Broken CLI Argument Parser | Easy | Enums & pattern matching |
| 2 | Conflicting Borrows | EasyβMed | Borrow checker |
| 3 | Invalid Lifetime Annotations | Medium | Lifetime annotations |
| 4 | Business Logic Errors | Medium | Math & correctness |
| 5 | Linked List Management | Medium | Ownership & data structures |
| 6 | Multi-threaded Deadlocks | Hard | Mutex & concurrency |
| 7 | Async Borrowing Conflicts | Hard | Async/await lifetimes |
| 8 | Unsafe FFI Integration | Hard | `unsafe` & C interop |
| 9 | Inefficient Data Pipeline | Hard | Performance optimization |
| 10 | Memory Leak Prevention | Hard+ | Weak pointers & ownership |
---
## Environment Variables / Secrets
The environment reads the following variables. Set them as **HF Space secrets** (Settings β Variables and Secrets) when deploying to Hugging Face, or in a local `.env` file for development.
| Variable | Required | Default | Description |
|----------------|----------|--------------------------------------|--------------------------------------|
| `HF_TOKEN` | Yes | β | Hugging Face API token for LLM calls |
| `API_BASE_URL` | No | `https://router.huggingface.co/v1` | Inference endpoint |
| `MODEL_NAME` | No | `Qwen/Qwen2.5-72B-Instruct` | Model to use for evaluation |
> **Note**: The `.env` file is excluded from Docker images by `.dockerignore`. On HF Spaces, secrets are injected as OS environment variables by the platform β `load_dotenv()` silently does nothing if no file is present, and `os.getenv()` reads from the platform-injected vars. This is the correct behavior.
---
## Setup & Usage
### Local Development
```bash
# 1. Clone and enter the repo
git clone https://github.com/your-username/rust_coder
cd rust_coder
# 2. Create .env with your credentials
cat > .env << EOF
HF_TOKEN=hf_your_token_here
API_BASE_URL=https://router.huggingface.co/v1
MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
EOF
# 3. Build the Docker image (uses root Dockerfile)
docker build -t rust_coder:latest .
# 4. Run the environment server
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest
# 5. Verify it's healthy
curl http://localhost:8000/health
# β {"status": "healthy"}
# 6. Run the inference benchmark
python inference.py
```
### Docker Commands Reference
```bash
# Build
docker build -t rust_coder:latest .
# Run with .env file
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest
# View logs
docker logs rust_env
# Stop
docker stop rust_env
```
### Environment API
```bash
# Reset (returns first problem)
curl -X POST http://localhost:8000/reset
# Step (submit Rust code)
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action": {"code": "fn main() { println!(\"hello\"); }"}}'
# Health check
curl http://localhost:8000/health
```
### HF Spaces Deployment
```bash
# Install HF CLI
pip install huggingface_hub
# Login
huggingface-cli login
# Push to Space
openenv push --repo-id your-username/rust-coder
```
Then go to your Space settings and add secrets:
- `HF_TOKEN` β your Hugging Face API token
- `MODEL_NAME` β e.g. `Qwen/Qwen2.5-72B-Instruct`
---
## Baseline Scores
Baseline using **Qwen/Qwen2.5-72B-Instruct** via Hugging Face router:
| Metric | Score |
|----------------|-------|
| Average reward | 0.59 |
| Compilation % | ~85% |
| Correctness % | ~45% |
---
## Project Structure
```
rust_coder/
βββ Dockerfile # Root Dockerfile (used by validator + HF Spaces)
βββ server/Dockerfile # Identical copy (used for -f flag builds)
βββ openenv.yaml # OpenEnv spec metadata
βββ pyproject.toml # Python package config
βββ uv.lock # Locked dependencies
βββ problems.json # 10 coding problems dataset
βββ models.py # Pydantic action/observation types
βββ client.py # WebSocket client for RustCoderEnv
βββ inference.py # Baseline inference script (entry point)
βββ __init__.py # Package exports
βββ server/
βββ app.py # FastAPI OpenEnv server entrypoint
βββ rust_coder_environment.py # Core environment logic
```
## HF Space runtime model
- The Hugging Face Space serves the environment via `uvicorn server.app:app` (see `openenv.yaml` and `Dockerfile`).
- The built-in OpenEnv web UI may send an empty action on Step; this environment supports that by auto-calling the LLM when `action.code` is empty (unless disabled via `AUTO_LLM_ON_EMPTY_STEP=0`).
- `inference.py` is the required baseline runner used by the validator/judge. It connects to the running Space and drives `reset()`/`step()` in a loop, emitting strict `[START]`/`[STEP]`/`[END]` stdout lines.
|