Spaces:

Parthiban007
/

rust_coder

Running

App Files Files Community

rust_coder / README.md

Parthiban007

Upload folder using huggingface_hub

e96c0d4 verified 1 day ago

preview code

raw

history blame contribute delete

8.46 kB

metadata

title: Rust Coder OpenEnv
emoji: 🦀
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8000
base_path: /web
pinned: false
tags:
  - openenv
  - software-engineering
  - rust

Rust Coder: Systems Engineering Environment

Rust Coder is a high-fidelity OpenEnv environment designed to evaluate and train LLM agents on real-world Rust systems programming tasks. Unlike toy environments, Rust Coder simulates valid engineering scenarios involving the borrow checker, concurrency, and memory safety.

Motivation

Rust is uniquely challenging for AI agents due to its strict compile-time safety guarantees. This environment provides a 10-task progression that measures an agent's ability to:

Fix borrow checker violations
Correctly annotate lifetimes
Resolve concurrency deadlocks
Write unsafe FFI code correctly
Identify and prevent memory leaks
Optimize data pipelines for performance

Action Space

Type: RustCoderAction

The agent submits a single string containing the complete, fixed Rust source code.

Field	Type	Description
`code`	string	Full Rust source code to compile and test

Observation Space

Type: RustCoderObservation

The environment returns detailed feedback after each submission:

Field	Type	Description
`problem_description`	string	Task requirements and context
`header_section`	string	LeetCode-style scaffold (imports + signatures/types)
`compilation_success`	bool	Whether `rustc` compiled the submitted code
`compilation_output`	string	Raw compiler errors and warnings
`test_results`	list[dict]	Per-test pass/fail results with error details
`reward_breakdown`	dict	Weighted score breakdown across 5 dimensions

Reward Function

Total reward is a weighted sum of 5 dimensions, each normalized to [0, 1]:

Dimension	Weight	Metric
Compilation	40%	Binary success/failure of `rustc`
Correctness	20%	Fraction of test assertions that pass
Coverage	20%	Fraction of tests that successfully ran
Elegance	10%	Code quality heuristics (avoids `.unwrap()`, long lines, `unsafe`)
Efficiency	10%	Execution time vs. per-problem baseline

Reward provides partial signal at every step — compilation alone earns 0.40, passing all tests earns up to 1.0.

Tasks

10 sequential problems with increasing difficulty:

ID	Title	Difficulty	Skill Evaluated
1	Broken CLI Argument Parser	Easy	Enums & pattern matching
2	Conflicting Borrows	Easy→Med	Borrow checker
3	Invalid Lifetime Annotations	Medium	Lifetime annotations
4	Business Logic Errors	Medium	Math & correctness
5	Linked List Management	Medium	Ownership & data structures
6	Multi-threaded Deadlocks	Hard	Mutex & concurrency
7	Async Borrowing Conflicts	Hard	Async/await lifetimes
8	Unsafe FFI Integration	Hard	`unsafe` & C interop
9	Inefficient Data Pipeline	Hard	Performance optimization
10	Memory Leak Prevention	Hard+	Weak pointers & ownership

Environment Variables / Secrets

The environment reads the following variables. Set them as HF Space secrets (Settings → Variables and Secrets) when deploying to Hugging Face, or in a local .env file for development.

Variable	Required	Default	Description
`HF_TOKEN`	Yes	—	Hugging Face API token for LLM calls
`API_BASE_URL`	No	`https://router.huggingface.co/v1`	Inference endpoint
`MODEL_NAME`	No	`Qwen/Qwen2.5-72B-Instruct`	Model to use for evaluation

Note: The .env file is excluded from Docker images by .dockerignore. On HF Spaces, secrets are injected as OS environment variables by the platform — load_dotenv() silently does nothing if no file is present, and os.getenv() reads from the platform-injected vars. This is the correct behavior.

Setup & Usage

Local Development

# 1. Clone and enter the repo
git clone https://github.com/your-username/rust_coder
cd rust_coder

# 2. Create .env with your credentials
cat > .env << EOF
HF_TOKEN=hf_your_token_here
API_BASE_URL=https://router.huggingface.co/v1
MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
EOF

# 3. Build the Docker image (uses root Dockerfile)
docker build -t rust_coder:latest .

# 4. Run the environment server
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest

# 5. Verify it's healthy
curl http://localhost:8000/health
# → {"status": "healthy"}

# 6. Run the inference benchmark
python inference.py

Docker Commands Reference

# Build
docker build -t rust_coder:latest .

# Run with .env file
docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest

# View logs
docker logs rust_env

# Stop
docker stop rust_env

Environment API

# Reset (returns first problem)
curl -X POST http://localhost:8000/reset

# Step (submit Rust code)
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action": {"code": "fn main() { println!(\"hello\"); }"}}'

# Health check
curl http://localhost:8000/health

HF Spaces Deployment

# Install HF CLI
pip install huggingface_hub

# Login
huggingface-cli login

# Push to Space
openenv push --repo-id your-username/rust-coder

Then go to your Space settings and add secrets:

HF_TOKEN → your Hugging Face API token
MODEL_NAME → e.g. Qwen/Qwen2.5-72B-Instruct

Baseline Scores

Baseline using Qwen/Qwen2.5-72B-Instruct via Hugging Face router:

Metric	Score
Average reward	0.59
Compilation %	~85%
Correctness %	~45%

Project Structure

rust_coder/
├── Dockerfile                     # Root Dockerfile (used by validator + HF Spaces)
├── server/Dockerfile              # Identical copy (used for -f flag builds)
├── openenv.yaml                   # OpenEnv spec metadata
├── pyproject.toml                 # Python package config
├── uv.lock                        # Locked dependencies
├── problems.json                  # 10 coding problems dataset
├── models.py                      # Pydantic action/observation types
├── client.py                      # WebSocket client for RustCoderEnv
├── inference.py                   # Baseline inference script (entry point)
├── __init__.py                    # Package exports
└── server/
    ├── app.py                     # FastAPI OpenEnv server entrypoint
    └── rust_coder_environment.py  # Core environment logic

HF Space runtime model

The Hugging Face Space serves the environment via uvicorn server.app:app (see openenv.yaml and Dockerfile).
The built-in OpenEnv web UI may send an empty action on Step; this environment supports that by auto-calling the LLM when action.code is empty (unless disabled via AUTO_LLM_ON_EMPTY_STEP=0).
inference.py is the required baseline runner used by the validator/judge. It connects to the running Space and drives reset()/step() in a loop, emitting strict [START]/[STEP]/[END] stdout lines.