Spaces:

Parthiban007
/

rust_coder

Running

App Files Files Community

rust_coder / README.md

Parthiban007

Upload folder using huggingface_hub

e96c0d4 verified 2 days ago

preview code

raw

history blame contribute delete

8.46 kB

	---
	title: Rust Coder OpenEnv
	emoji: 🦀
	colorFrom: red
	colorTo: yellow
	sdk: docker
	app_port: 8000
	base_path: /web
	pinned: false
	tags:
	- openenv
	- software-engineering
	- rust
	---

	# Rust Coder: Systems Engineering Environment

	Rust Coder is a high-fidelity OpenEnv environment designed to evaluate and train LLM agents on real-world Rust systems programming tasks. Unlike toy environments, Rust Coder simulates valid engineering scenarios involving the borrow checker, concurrency, and memory safety.

	## Motivation

	Rust is uniquely challenging for AI agents due to its strict compile-time safety guarantees. This environment provides a 10-task progression that measures an agent's ability to:

	1. Fix borrow checker violations
	2. Correctly annotate lifetimes
	3. Resolve concurrency deadlocks
	4. Write unsafe FFI code correctly
	5. Identify and prevent memory leaks
	6. Optimize data pipelines for performance

	---

	## Action Space

	Type: `RustCoderAction`

	The agent submits a single string containing the complete, fixed Rust source code.

	\| Field \| Type \| Description \|
	\|-------\|--------\|------------------------------------------\|
	\| `code` \| string \| Full Rust source code to compile and test \|

	## Observation Space

	Type: `RustCoderObservation`

	The environment returns detailed feedback after each submission:

	\| Field \| Type \| Description \|
	\|------------------------\|-------------\|-----------------------------------------------------\|
	\| `problem_description` \| string \| Task requirements and context \|
	\| `header_section` \| string \| LeetCode-style scaffold (imports + signatures/types) \|
	\| `compilation_success` \| bool \| Whether `rustc` compiled the submitted code \|
	\| `compilation_output` \| string \| Raw compiler errors and warnings \|
	\| `test_results` \| list[dict] \| Per-test pass/fail results with error details \|
	\| `reward_breakdown` \| dict \| Weighted score breakdown across 5 dimensions \|

	---

	## Reward Function

	Total reward is a weighted sum of 5 dimensions, each normalized to [0, 1]:

	\| Dimension \| Weight \| Metric \|
	\|-----------------\|--------\|---------------------------------------------------\|
	\| Compilation \| 40% \| Binary success/failure of `rustc` \|
	\| Correctness \| 20% \| Fraction of test assertions that pass \|
	\| Coverage \| 20% \| Fraction of tests that successfully ran \|
	\| Elegance \| 10% \| Code quality heuristics (avoids `.unwrap()`, long lines, `unsafe`) \|
	\| Efficiency \| 10% \| Execution time vs. per-problem baseline \|

	Reward provides partial signal at every step — compilation alone earns 0.40, passing all tests earns up to 1.0.

	---

	## Tasks

	10 sequential problems with increasing difficulty:

	\| ID \| Title \| Difficulty \| Skill Evaluated \|
	\|----\|------------------------------------\|------------\|-------------------------------\|
	\| 1 \| Broken CLI Argument Parser \| Easy \| Enums & pattern matching \|
	\| 2 \| Conflicting Borrows \| Easy→Med \| Borrow checker \|
	\| 3 \| Invalid Lifetime Annotations \| Medium \| Lifetime annotations \|
	\| 4 \| Business Logic Errors \| Medium \| Math & correctness \|
	\| 5 \| Linked List Management \| Medium \| Ownership & data structures \|
	\| 6 \| Multi-threaded Deadlocks \| Hard \| Mutex & concurrency \|
	\| 7 \| Async Borrowing Conflicts \| Hard \| Async/await lifetimes \|
	\| 8 \| Unsafe FFI Integration \| Hard \| `unsafe` & C interop \|
	\| 9 \| Inefficient Data Pipeline \| Hard \| Performance optimization \|
	\| 10 \| Memory Leak Prevention \| Hard+ \| Weak pointers & ownership \|

	---

	## Environment Variables / Secrets

	The environment reads the following variables. Set them as HF Space secrets (Settings → Variables and Secrets) when deploying to Hugging Face, or in a local `.env` file for development.

	\| Variable \| Required \| Default \| Description \|
	\|----------------\|----------\|--------------------------------------\|--------------------------------------\|
	\| `HF_TOKEN` \| Yes \| — \| Hugging Face API token for LLM calls \|
	\| `API_BASE_URL` \| No \| `https://router.huggingface.co/v1` \| Inference endpoint \|
	\| `MODEL_NAME` \| No \| `Qwen/Qwen2.5-72B-Instruct` \| Model to use for evaluation \|

	> Note: The `.env` file is excluded from Docker images by `.dockerignore`. On HF Spaces, secrets are injected as OS environment variables by the platform — `load_dotenv()` silently does nothing if no file is present, and `os.getenv()` reads from the platform-injected vars. This is the correct behavior.

	---

	## Setup & Usage

	### Local Development

	```bash
	# 1. Clone and enter the repo
	git clone https://github.com/your-username/rust_coder
	cd rust_coder

	# 2. Create .env with your credentials
	cat > .env << EOF
	HF_TOKEN=hf_your_token_here
	API_BASE_URL=https://router.huggingface.co/v1
	MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
	EOF

	# 3. Build the Docker image (uses root Dockerfile)
	docker build -t rust_coder:latest .

	# 4. Run the environment server
	docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest

	# 5. Verify it's healthy
	curl http://localhost:8000/health
	# → {"status": "healthy"}

	# 6. Run the inference benchmark
	python inference.py
	```

	### Docker Commands Reference

	```bash
	# Build
	docker build -t rust_coder:latest .

	# Run with .env file
	docker run -d -p 8000:8000 --env-file .env --name rust_env rust_coder:latest

	# View logs
	docker logs rust_env

	# Stop
	docker stop rust_env
	```

	### Environment API

	```bash
	# Reset (returns first problem)
	curl -X POST http://localhost:8000/reset

	# Step (submit Rust code)
	curl -X POST http://localhost:8000/step \
	-H "Content-Type: application/json" \
	-d '{"action": {"code": "fn main() { println!(\"hello\"); }"}}'

	# Health check
	curl http://localhost:8000/health
	```

	### HF Spaces Deployment

	```bash
	# Install HF CLI
	pip install huggingface_hub

	# Login
	huggingface-cli login

	# Push to Space
	openenv push --repo-id your-username/rust-coder
	```

	Then go to your Space settings and add secrets:
	- `HF_TOKEN` → your Hugging Face API token
	- `MODEL_NAME` → e.g. `Qwen/Qwen2.5-72B-Instruct`

	---

	## Baseline Scores

	Baseline using Qwen/Qwen2.5-72B-Instruct via Hugging Face router:

	\| Metric \| Score \|
	\|----------------\|-------\|
	\| Average reward \| 0.59 \|
	\| Compilation % \| ~85% \|
	\| Correctness % \| ~45% \|

	---

	## Project Structure

	```
	rust_coder/
	├── Dockerfile # Root Dockerfile (used by validator + HF Spaces)
	├── server/Dockerfile # Identical copy (used for -f flag builds)
	├── openenv.yaml # OpenEnv spec metadata
	├── pyproject.toml # Python package config
	├── uv.lock # Locked dependencies
	├── problems.json # 10 coding problems dataset
	├── models.py # Pydantic action/observation types
	├── client.py # WebSocket client for RustCoderEnv
	├── inference.py # Baseline inference script (entry point)
	├── __init__.py # Package exports
	└── server/
	├── app.py # FastAPI OpenEnv server entrypoint
	└── rust_coder_environment.py # Core environment logic
	```

	## HF Space runtime model

	- The Hugging Face Space serves the environment via `uvicorn server.app:app` (see `openenv.yaml` and `Dockerfile`).
	- The built-in OpenEnv web UI may send an empty action on Step; this environment supports that by auto-calling the LLM when `action.code` is empty (unless disabled via `AUTO_LLM_ON_EMPTY_STEP=0`).
	- `inference.py` is the required baseline runner used by the validator/judge. It connects to the running Space and drives `reset()`/`step()` in a loop, emitting strict `[START]`/`[STEP]`/`[END]` stdout lines.