Spaces:

PRANAV05092003
/

autonomous-code-refactoring-env

Sleeping

App Files Files Community

autonomous-code-refactoring-env / README.md

PRANAV05092003

Final multi-mode OpenEnv fix

19e4a1d about 2 months ago

preview code

raw

history blame contribute delete

5.98 kB

	---
	title: ACRE - Autonomous Code Refactoring Environment
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_file: server.py
	app_port: 7860
	pinned: false
	license: mit
	tags:
	- openenv
	---

	# 🚀 ACRE — Autonomous Code Refactoring Environment

	> OpenEnv-powered AI system for real-world code optimization, refactoring, and evaluation.

	![Status](https://img.shields.io/badge/Status-Running-success)
	![OpenEnv](https://img.shields.io/badge/OpenEnv-Compatible-blue)
	![Docker](https://img.shields.io/badge/Docker-Ready-green)

	---

	## 🔥 Overview

	ACRE is an OpenEnv-compliant environment designed to simulate real-world software engineering workflows such as code cleanup, optimization, and refactoring using AI agents.

	It enables agents to iteratively improve code through structured actions while receiving dense, step-wise reward feedback.

	## Environment Overview and Motivation

	ACRE models a realistic developer workflow where an agent incrementally improves Python code quality under a fixed action budget.
	The environment is designed for OpenEnv Round 1 requirements: typed APIs, deterministic grading, multi-difficulty tasks, and reproducible inference behavior.

	---

	## 💡 Why This Matters

	Modern software systems require automated code optimization and intelligent tooling.

	ACRE enables:
	- 🤖 AI coding assistants
	- 🔍 Automated code review systems
	- ⚡ Reinforcement learning-based optimization agents
	- 🧠 Learning real developer workflows

	---

	## 🔄 How It Works

	Code → Action → Refactor → Reward → Repeat

	1. Load messy code
	2. Apply transformation
	3. Evaluate using grader
	4. Compute reward
	5. Iterate until optimal

	---

	## 🧠 Key Features

	- ✅ Autonomous code refactoring
	- ⚡ Step-wise reward feedback
	- 🧪 OpenEnv compliant interface
	- 📊 Deterministic grading system
	- 🔁 Reproducible inference pipeline
	- 🐳 Fully containerized (Docker + Hugging Face Spaces)

	---

	## 📂 Tasks

	\| Task ID \| Difficulty \| Objective \|
	\|--------\|----------\|----------\|
	\| `rename_variables` \| Easy \| Replace generic variable names \|
	\| `remove_dead_code` \| Medium \| Remove unreachable logic \|
	\| `full_refactor` \| Hard \| Combine multiple optimizations \|

	Each task uses AST-based transformations and deterministic grading.

	## Task Descriptions with Expected Difficulty Levels

	- Easy (`rename_variables`): rename generic names like `x`, `tmp`, `i` into descriptive identifiers.
	- Medium (`remove_dead_code`): remove unreachable branches and unused assignments while preserving behavior.
	- Hard (`full_refactor`): combine renaming, dead-code elimination, loop simplification, condition cleanup, and helper inlining.

	---

	## 🎯 Reward System

	Rewards are computed at every step:

	- ✅ Valid executable code → positive reward
	- 📉 Reduced complexity → reward
	- ⚡ Improved performance → reward
	- ❌ Errors or invalid code → penalty
	- 🔁 No progress → penalty

	Normalization:

	`(raw_reward + 32) / 52 → [0, 1]`

	---

	## 📊 Example Execution

	```text
	[START] task=rename_variables
	[STEP] action=0
	[END] task=rename_variables score=1.00

	[START] task=remove_dead_code
	[STEP] action=1
	[END] task=remove_dead_code score=0.25

	[START] task=full_refactor
	[STEP] action=3
	[END] task=full_refactor score=0.71

	Final Score: 0.65
	```

	---

	## 🏗️ Architecture

	- `server/app.py` → FastAPI entry point used by OpenEnv + Docker
	- `server.py` → legacy local runner / UI helper
	- `openenv_interface.py` → OpenEnv wrapper
	- `acre/env/` → Core environment logic
	- `acre/tasks/` → Task definitions
	- `acre/utils/` → Metrics and helpers
	- `inference.py` → Evaluation pipeline

	---

	## ⚙️ OpenEnv Interface

	```python
	observation = env.reset()
	observation, reward, done, info = env.step(action)
	state = env.state()
	```

	Uses Pydantic models:

	- `ObservationModel`
	- `ActionModel`
	- `RewardModel`

	## Definitions of Action and Observation Spaces

	- Observation space: Box(4) with fields `code_length`, `complexity_score`, `runtime_s`, `error_flag`.
	- Action space: Discrete(5) with actions `rename_variable`, `remove_dead_code`, `simplify_loop`, `optimize_condition`, `inline_function`.

	---

	## 🌐 HTTP API

	\| Method \| Endpoint \| Description \|
	\|---\|---\|---\|
	\| GET \| `/` \| Health check \|
	\| GET \| `/health` \| Compatibility check \|
	\| POST \| `/reset` \| Reset environment \|
	\| POST \| `/step` \| Execute action \|
	\| GET \| `/state` \| Get state \|
	\| GET \| `/tasks` \| List tasks \|
	\| POST \| `/tasks/{task_id}/grade` \| Grade code \|

	---

	## 🚀 Run Locally

	## Setup and Usage Instructions

	```bash
	pip install -r requirements.txt
	uvicorn server.app:app --host 0.0.0.0 --port 7860
	```

	---

	## 🐳 Docker / Hugging Face Spaces

	```bash
	docker build -t acre .
	docker run -p 7860:7860 \
	-e API_BASE_URL=https://api.openai.com/v1 \
	-e MODEL_NAME=gpt-4o-mini \
	-e API_KEY=your_key \
	-e ENV_URL=http://localhost:7860 \
	acre
	```

	---

	## 🧪 Inference

	Set environment variables:

	```bash
	export API_BASE_URL=https://api.openai.com/v1
	export MODEL_NAME=gpt-4o-mini
	export API_KEY=your_key
	export ENV_URL=http://localhost:7860
	```

	Run:

	```bash
	python inference.py
	```

	Expected output:

	```text
	Easy: 1.00
	Medium: 0.25
	Hard: 0.71
	Final: 0.65
	```

	---

	## 📌 OpenEnv Compliance

	- ✔ `step()` implemented
	- ✔ `reset()` implemented
	- ✔ `state()` implemented
	- ✔ reward shaping
	- ✔ deterministic grading
	- ✔ structured logs

	---

	## 🧪 Validation

	```bash
	python validate.py --url http://localhost:7860
	```

	Or:

	```bash
	openenv validate
	```

	---

	## 🌐 Live Demo

	👉 Running on Hugging Face Spaces

	---

	## 📊 Baseline Performance

	## Baseline Performance Scores

	\| Task \| Score \|
	\|---\|---\|
	\| `rename_variables` \| 1.0000 \|
	\| `remove_dead_code` \| 0.2500 \|
	\| `full_refactor` \| 0.7143 \|
	\| Average \| 0.6548 \|

	---

	## 🏆 Use Cases

	- AI-powered code optimization
	- Automated refactoring tools
	- Reinforcement learning environments
	- Developer productivity systems

	---

	## 📜 License

	MIT License