Spaces:
Running
title: Rust Coder OpenEnv
emoji: π¦
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8000
base_path: /web
pinned: false
tags:
- openenv
- software-engineering
- rust
Rust Coder: Systems Engineering Environment
Rust Coder is a high-fidelity OpenEnv environment designed to evaluate and train LLM agents on real-world Rust systems programming tasks. Unlike toy environments, Rust Coder simulates valid engineering scenarios involving the borrow checker, concurrency, and memory safety.
Motivation
Rust is uniquely challenging for AI agents due to its strict compile-time safety guarantees. This environment provides a 10-task progression that measures an agent's ability to:
- Fix borrow checker violations
- Correctly annotate lifetimes
- Resolve concurrency deadlocks
- Write unsafe FFI code correctly
- Identify and prevent memory leaks
- Optimize data pipelines for performance
Action Space
Type: RustCoderAction
The agent submits a single string containing the complete, fixed Rust source code.
| Field | Type | Description |
|---|---|---|
code |
string | Full Rust source code to compile and test |
Observation Space
Type: RustCoderObservation
The environment returns detailed feedback after each submission:
| Field | Type | Description |
|---|---|---|
problem_description |
string | Task requirements and context |
header_section |
string | LeetCode-style scaffold (imports + signatures/types) |
compilation_success |
bool | Whether rustc compiled the submitted code |
compilation_output |
string | Raw compiler errors and warnings |
test_results |
list[dict] | Per-test pass/fail results with error details |
reward_breakdown |
dict | Weighted score breakdown across 5 dimensions |
Reward Function
Total reward is a weighted sum of 5 dimensions, each normalized to [0, 1]:
| Dimension | Weight | Metric |
|---|---|---|
| Compilation | 40% | Binary success/failure of rustc |
| Correctness | 20% | Fraction of test assertions that pass |
| Coverage | 20% | Fraction of tests that successfully ran |
| Elegance | 10% | Code quality heuristics (avoids .unwrap(), long lines, unsafe) |
| Efficiency | 10% | Execution time vs. per-problem baseline |
Reward provides partial signal at every step β compilation alone earns 0.40, passing all tests earns up to 1.0.
Tasks
10 sequential problems with increasing difficulty:
| ID | Title | Difficulty | Skill Evaluated |
|---|---|---|---|
| 1 | Broken CLI Argument Parser | Easy | Enums & pattern matching |
| 2 | Conflicting Borrows | EasyβMed | Borrow checker |
| 3 | Invalid Lifetime Annotations | Medium | Lifetime annotations |
| 4 | Business Logic Errors | Medium | Math & correctness |
| 5 | Linked List Management | Medium | Ownership & data structures |
| 6 | Multi-threaded Deadlocks | Hard | Mutex & concurrency |
| 7 | Async Borrowing Conflicts | Hard | Async/await lifetimes |
| 8 | Unsafe FFI Integration | Hard | unsafe & C interop |
| 9 | Inefficient Data Pipeline | Hard | Performance optimization |
| 10 | Memory Leak Prevention | Hard+ | Weak pointers & ownership |
Environment Variables / Secrets
The environment reads the following variables. Set them as HF Space secrets (Settings β Variables and Secrets) when deploying to Hugging Face, or in a local .env file for development.
| Variable | Required | Default | Description |
|---|---|---|---|
HF_TOKEN |
Yes | β | Hugging Face API token for LLM calls |
API_BASE_URL |
No | https://router.huggingface.co/v1 |
Inference endpoint |
MODEL_NAME |
No | Qwen/Qwen2.5-72B-Instruct |
Model to use for evaluation |
Note: The
.envfile is excluded from Docker images by.dockerignore. On HF Spaces, secrets are injected as OS environment variables by the platform βload_dotenv()silently does nothing if no file is present, andos.getenv()reads from the platform-injected vars. This is the correct behavior.
Setup & Usage
Local Development
# 1. Clone and enter the repo
git clone https://github.com/your-username/rust_coder
cd rust_coder
# 2. Create .env with your credentials
cat > .env << EOF
HF_TOKEN=hf_your_token_here
API_BASE_URL=https://router.huggingface.co/v1
MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
EOF
# 3. Build the Docker image (uses root Dockerfile)
docker build -t rust_coder:latest .
# 4. Run the environment server
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest
# 5. Verify it's healthy
curl http://localhost:8000/health
# β {"status": "healthy"}
# 6. Run the inference benchmark
python inference.py
Docker Commands Reference
# Build
docker build -t rust_coder:latest .
# Run with .env file
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest
# View logs
docker logs rust_env
# Stop
docker stop rust_env
Environment API
# Reset (returns first problem)
curl -X POST http://localhost:8000/reset
# Step (submit Rust code)
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action": {"code": "fn main() { println!(\"hello\"); }"}}'
# Health check
curl http://localhost:8000/health
HF Spaces Deployment
# Install HF CLI
pip install huggingface_hub
# Login
huggingface-cli login
# Push to Space
openenv push --repo-id your-username/rust-coder
Then go to your Space settings and add secrets:
HF_TOKENβ your Hugging Face API tokenMODEL_NAMEβ e.g.Qwen/Qwen2.5-72B-Instruct
Baseline Scores
Baseline using Qwen/Qwen2.5-72B-Instruct via Hugging Face router:
| Metric | Score |
|---|---|
| Average reward | 0.59 |
| Compilation % | ~85% |
| Correctness % | ~45% |
Project Structure
rust_coder/
βββ Dockerfile # Root Dockerfile (used by validator + HF Spaces)
βββ server/Dockerfile # Identical copy (used for -f flag builds)
βββ openenv.yaml # OpenEnv spec metadata
βββ pyproject.toml # Python package config
βββ uv.lock # Locked dependencies
βββ problems.json # 10 coding problems dataset
βββ models.py # Pydantic action/observation types
βββ client.py # WebSocket client for RustCoderEnv
βββ inference.py # Baseline inference script (entry point)
βββ __init__.py # Package exports
βββ server/
βββ app.py # FastAPI OpenEnv server entrypoint
βββ rust_coder_environment.py # Core environment logic
HF Space runtime model
- The Hugging Face Space serves the environment via
uvicorn server.app:app(seeopenenv.yamlandDockerfile). - The built-in OpenEnv web UI may send an empty action on Step; this environment supports that by auto-calling the LLM when
action.codeis empty (unless disabled viaAUTO_LLM_ON_EMPTY_STEP=0). inference.pyis the required baseline runner used by the validator/judge. It connects to the running Space and drivesreset()/step()in a loop, emitting strict[START]/[STEP]/[END]stdout lines.