rust_coder / README.md
Parthiban007's picture
Upload folder using huggingface_hub
e96c0d4 verified
---
title: Rust Coder OpenEnv
emoji: πŸ¦€
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8000
base_path: /web
pinned: false
tags:
- openenv
- software-engineering
- rust
---
# Rust Coder: Systems Engineering Environment
Rust Coder is a high-fidelity **OpenEnv** environment designed to evaluate and train LLM agents on real-world Rust systems programming tasks. Unlike toy environments, Rust Coder simulates valid engineering scenarios involving the borrow checker, concurrency, and memory safety.
## Motivation
Rust is uniquely challenging for AI agents due to its strict compile-time safety guarantees. This environment provides a 10-task progression that measures an agent's ability to:
1. Fix borrow checker violations
2. Correctly annotate lifetimes
3. Resolve concurrency deadlocks
4. Write unsafe FFI code correctly
5. Identify and prevent memory leaks
6. Optimize data pipelines for performance
---
## Action Space
**Type**: `RustCoderAction`
The agent submits a single string containing the complete, fixed Rust source code.
| Field | Type | Description |
|-------|--------|------------------------------------------|
| `code` | string | Full Rust source code to compile and test |
## Observation Space
**Type**: `RustCoderObservation`
The environment returns detailed feedback after each submission:
| Field | Type | Description |
|------------------------|-------------|-----------------------------------------------------|
| `problem_description` | string | Task requirements and context |
| `header_section` | string | LeetCode-style scaffold (imports + signatures/types) |
| `compilation_success` | bool | Whether `rustc` compiled the submitted code |
| `compilation_output` | string | Raw compiler errors and warnings |
| `test_results` | list[dict] | Per-test pass/fail results with error details |
| `reward_breakdown` | dict | Weighted score breakdown across 5 dimensions |
---
## Reward Function
Total reward is a weighted sum of 5 dimensions, each normalized to [0, 1]:
| Dimension | Weight | Metric |
|-----------------|--------|---------------------------------------------------|
| Compilation | 40% | Binary success/failure of `rustc` |
| Correctness | 20% | Fraction of test assertions that pass |
| Coverage | 20% | Fraction of tests that successfully ran |
| Elegance | 10% | Code quality heuristics (avoids `.unwrap()`, long lines, `unsafe`) |
| Efficiency | 10% | Execution time vs. per-problem baseline |
Reward provides partial signal at every step β€” compilation alone earns 0.40, passing all tests earns up to 1.0.
---
## Tasks
10 sequential problems with increasing difficulty:
| ID | Title | Difficulty | Skill Evaluated |
|----|------------------------------------|------------|-------------------------------|
| 1 | Broken CLI Argument Parser | Easy | Enums & pattern matching |
| 2 | Conflicting Borrows | Easy→Med | Borrow checker |
| 3 | Invalid Lifetime Annotations | Medium | Lifetime annotations |
| 4 | Business Logic Errors | Medium | Math & correctness |
| 5 | Linked List Management | Medium | Ownership & data structures |
| 6 | Multi-threaded Deadlocks | Hard | Mutex & concurrency |
| 7 | Async Borrowing Conflicts | Hard | Async/await lifetimes |
| 8 | Unsafe FFI Integration | Hard | `unsafe` & C interop |
| 9 | Inefficient Data Pipeline | Hard | Performance optimization |
| 10 | Memory Leak Prevention | Hard+ | Weak pointers & ownership |
---
## Environment Variables / Secrets
The environment reads the following variables. Set them as **HF Space secrets** (Settings β†’ Variables and Secrets) when deploying to Hugging Face, or in a local `.env` file for development.
| Variable | Required | Default | Description |
|----------------|----------|--------------------------------------|--------------------------------------|
| `HF_TOKEN` | Yes | β€” | Hugging Face API token for LLM calls |
| `API_BASE_URL` | No | `https://router.huggingface.co/v1` | Inference endpoint |
| `MODEL_NAME` | No | `Qwen/Qwen2.5-72B-Instruct` | Model to use for evaluation |
> **Note**: The `.env` file is excluded from Docker images by `.dockerignore`. On HF Spaces, secrets are injected as OS environment variables by the platform β€” `load_dotenv()` silently does nothing if no file is present, and `os.getenv()` reads from the platform-injected vars. This is the correct behavior.
---
## Setup & Usage
### Local Development
```bash
# 1. Clone and enter the repo
git clone https://github.com/your-username/rust_coder
cd rust_coder
# 2. Create .env with your credentials
cat > .env << EOF
HF_TOKEN=hf_your_token_here
API_BASE_URL=https://router.huggingface.co/v1
MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
EOF
# 3. Build the Docker image (uses root Dockerfile)
docker build -t rust_coder:latest .
# 4. Run the environment server
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest
# 5. Verify it's healthy
curl http://localhost:8000/health
# β†’ {"status": "healthy"}
# 6. Run the inference benchmark
python inference.py
```
### Docker Commands Reference
```bash
# Build
docker build -t rust_coder:latest .
# Run with .env file
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest
# View logs
docker logs rust_env
# Stop
docker stop rust_env
```
### Environment API
```bash
# Reset (returns first problem)
curl -X POST http://localhost:8000/reset
# Step (submit Rust code)
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action": {"code": "fn main() { println!(\"hello\"); }"}}'
# Health check
curl http://localhost:8000/health
```
### HF Spaces Deployment
```bash
# Install HF CLI
pip install huggingface_hub
# Login
huggingface-cli login
# Push to Space
openenv push --repo-id your-username/rust-coder
```
Then go to your Space settings and add secrets:
- `HF_TOKEN` β†’ your Hugging Face API token
- `MODEL_NAME` β†’ e.g. `Qwen/Qwen2.5-72B-Instruct`
---
## Baseline Scores
Baseline using **Qwen/Qwen2.5-72B-Instruct** via Hugging Face router:
| Metric | Score |
|----------------|-------|
| Average reward | 0.59 |
| Compilation % | ~85% |
| Correctness % | ~45% |
---
## Project Structure
```
rust_coder/
β”œβ”€β”€ Dockerfile # Root Dockerfile (used by validator + HF Spaces)
β”œβ”€β”€ server/Dockerfile # Identical copy (used for -f flag builds)
β”œβ”€β”€ openenv.yaml # OpenEnv spec metadata
β”œβ”€β”€ pyproject.toml # Python package config
β”œβ”€β”€ uv.lock # Locked dependencies
β”œβ”€β”€ problems.json # 10 coding problems dataset
β”œβ”€β”€ models.py # Pydantic action/observation types
β”œβ”€β”€ client.py # WebSocket client for RustCoderEnv
β”œβ”€β”€ inference.py # Baseline inference script (entry point)
β”œβ”€β”€ __init__.py # Package exports
└── server/
β”œβ”€β”€ app.py # FastAPI OpenEnv server entrypoint
└── rust_coder_environment.py # Core environment logic
```
## HF Space runtime model
- The Hugging Face Space serves the environment via `uvicorn server.app:app` (see `openenv.yaml` and `Dockerfile`).
- The built-in OpenEnv web UI may send an empty action on Step; this environment supports that by auto-calling the LLM when `action.code` is empty (unless disabled via `AUTO_LLM_ON_EMPTY_STEP=0`).
- `inference.py` is the required baseline runner used by the validator/judge. It connects to the running Space and drives `reset()`/`step()` in a loop, emitting strict `[START]`/`[STEP]`/`[END]` stdout lines.