rust_coder / README.md
Parthiban007's picture
Upload folder using huggingface_hub
e96c0d4 verified
metadata
title: Rust Coder OpenEnv
emoji: πŸ¦€
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8000
base_path: /web
pinned: false
tags:
  - openenv
  - software-engineering
  - rust

Rust Coder: Systems Engineering Environment

Rust Coder is a high-fidelity OpenEnv environment designed to evaluate and train LLM agents on real-world Rust systems programming tasks. Unlike toy environments, Rust Coder simulates valid engineering scenarios involving the borrow checker, concurrency, and memory safety.

Motivation

Rust is uniquely challenging for AI agents due to its strict compile-time safety guarantees. This environment provides a 10-task progression that measures an agent's ability to:

  1. Fix borrow checker violations
  2. Correctly annotate lifetimes
  3. Resolve concurrency deadlocks
  4. Write unsafe FFI code correctly
  5. Identify and prevent memory leaks
  6. Optimize data pipelines for performance

Action Space

Type: RustCoderAction

The agent submits a single string containing the complete, fixed Rust source code.

Field Type Description
code string Full Rust source code to compile and test

Observation Space

Type: RustCoderObservation

The environment returns detailed feedback after each submission:

Field Type Description
problem_description string Task requirements and context
header_section string LeetCode-style scaffold (imports + signatures/types)
compilation_success bool Whether rustc compiled the submitted code
compilation_output string Raw compiler errors and warnings
test_results list[dict] Per-test pass/fail results with error details
reward_breakdown dict Weighted score breakdown across 5 dimensions

Reward Function

Total reward is a weighted sum of 5 dimensions, each normalized to [0, 1]:

Dimension Weight Metric
Compilation 40% Binary success/failure of rustc
Correctness 20% Fraction of test assertions that pass
Coverage 20% Fraction of tests that successfully ran
Elegance 10% Code quality heuristics (avoids .unwrap(), long lines, unsafe)
Efficiency 10% Execution time vs. per-problem baseline

Reward provides partial signal at every step β€” compilation alone earns 0.40, passing all tests earns up to 1.0.


Tasks

10 sequential problems with increasing difficulty:

ID Title Difficulty Skill Evaluated
1 Broken CLI Argument Parser Easy Enums & pattern matching
2 Conflicting Borrows Easy→Med Borrow checker
3 Invalid Lifetime Annotations Medium Lifetime annotations
4 Business Logic Errors Medium Math & correctness
5 Linked List Management Medium Ownership & data structures
6 Multi-threaded Deadlocks Hard Mutex & concurrency
7 Async Borrowing Conflicts Hard Async/await lifetimes
8 Unsafe FFI Integration Hard unsafe & C interop
9 Inefficient Data Pipeline Hard Performance optimization
10 Memory Leak Prevention Hard+ Weak pointers & ownership

Environment Variables / Secrets

The environment reads the following variables. Set them as HF Space secrets (Settings β†’ Variables and Secrets) when deploying to Hugging Face, or in a local .env file for development.

Variable Required Default Description
HF_TOKEN Yes β€” Hugging Face API token for LLM calls
API_BASE_URL No https://router.huggingface.co/v1 Inference endpoint
MODEL_NAME No Qwen/Qwen2.5-72B-Instruct Model to use for evaluation

Note: The .env file is excluded from Docker images by .dockerignore. On HF Spaces, secrets are injected as OS environment variables by the platform β€” load_dotenv() silently does nothing if no file is present, and os.getenv() reads from the platform-injected vars. This is the correct behavior.


Setup & Usage

Local Development

# 1. Clone and enter the repo
git clone https://github.com/your-username/rust_coder
cd rust_coder

# 2. Create .env with your credentials
cat > .env << EOF
HF_TOKEN=hf_your_token_here
API_BASE_URL=https://router.huggingface.co/v1
MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
EOF

# 3. Build the Docker image (uses root Dockerfile)
docker build -t rust_coder:latest .

# 4. Run the environment server
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest

# 5. Verify it's healthy
curl http://localhost:8000/health
# β†’ {"status": "healthy"}

# 6. Run the inference benchmark
python inference.py

Docker Commands Reference

# Build
docker build -t rust_coder:latest .

# Run with .env file
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest

# View logs
docker logs rust_env

# Stop
docker stop rust_env

Environment API

# Reset (returns first problem)
curl -X POST http://localhost:8000/reset

# Step (submit Rust code)
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action": {"code": "fn main() { println!(\"hello\"); }"}}'

# Health check
curl http://localhost:8000/health

HF Spaces Deployment

# Install HF CLI
pip install huggingface_hub

# Login
huggingface-cli login

# Push to Space
openenv push --repo-id your-username/rust-coder

Then go to your Space settings and add secrets:

  • HF_TOKEN β†’ your Hugging Face API token
  • MODEL_NAME β†’ e.g. Qwen/Qwen2.5-72B-Instruct

Baseline Scores

Baseline using Qwen/Qwen2.5-72B-Instruct via Hugging Face router:

Metric Score
Average reward 0.59
Compilation % ~85%
Correctness % ~45%

Project Structure

rust_coder/
β”œβ”€β”€ Dockerfile                     # Root Dockerfile (used by validator + HF Spaces)
β”œβ”€β”€ server/Dockerfile              # Identical copy (used for -f flag builds)
β”œβ”€β”€ openenv.yaml                   # OpenEnv spec metadata
β”œβ”€β”€ pyproject.toml                 # Python package config
β”œβ”€β”€ uv.lock                        # Locked dependencies
β”œβ”€β”€ problems.json                  # 10 coding problems dataset
β”œβ”€β”€ models.py                      # Pydantic action/observation types
β”œβ”€β”€ client.py                      # WebSocket client for RustCoderEnv
β”œβ”€β”€ inference.py                   # Baseline inference script (entry point)
β”œβ”€β”€ __init__.py                    # Package exports
└── server/
    β”œβ”€β”€ app.py                     # FastAPI OpenEnv server entrypoint
    └── rust_coder_environment.py  # Core environment logic

HF Space runtime model

  • The Hugging Face Space serves the environment via uvicorn server.app:app (see openenv.yaml and Dockerfile).
  • The built-in OpenEnv web UI may send an empty action on Step; this environment supports that by auto-calling the LLM when action.code is empty (unless disabled via AUTO_LLM_ON_EMPTY_STEP=0).
  • inference.py is the required baseline runner used by the validator/judge. It connects to the running Space and drives reset()/step() in a loop, emitting strict [START]/[STEP]/[END] stdout lines.