| --- |
| title: ACRE - Autonomous Code Refactoring Environment |
| colorFrom: blue |
| colorTo: green |
| sdk: docker |
| app_file: server.py |
| app_port: 7860 |
| pinned: false |
| license: mit |
| tags: |
| - openenv |
| --- |
| |
| # π ACRE β Autonomous Code Refactoring Environment |
|
|
| > OpenEnv-powered AI system for real-world code optimization, refactoring, and evaluation. |
|
|
|  |
|  |
|  |
|
|
| --- |
|
|
| ## π₯ Overview |
|
|
| ACRE is an OpenEnv-compliant environment designed to simulate real-world software engineering workflows such as code cleanup, optimization, and refactoring using AI agents. |
|
|
| It enables agents to iteratively improve code through structured actions while receiving dense, step-wise reward feedback. |
|
|
| ## Environment Overview and Motivation |
|
|
| ACRE models a realistic developer workflow where an agent incrementally improves Python code quality under a fixed action budget. |
| The environment is designed for OpenEnv Round 1 requirements: typed APIs, deterministic grading, multi-difficulty tasks, and reproducible inference behavior. |
|
|
| --- |
|
|
| ## π‘ Why This Matters |
|
|
| Modern software systems require automated code optimization and intelligent tooling. |
|
|
| ACRE enables: |
| - π€ AI coding assistants |
| - π Automated code review systems |
| - β‘ Reinforcement learning-based optimization agents |
| - π§ Learning real developer workflows |
|
|
| --- |
|
|
| ## π How It Works |
|
|
| Code β Action β Refactor β Reward β Repeat |
|
|
| 1. Load messy code |
| 2. Apply transformation |
| 3. Evaluate using grader |
| 4. Compute reward |
| 5. Iterate until optimal |
|
|
| --- |
|
|
| ## π§ Key Features |
|
|
| - β
Autonomous code refactoring |
| - β‘ Step-wise reward feedback |
| - π§ͺ OpenEnv compliant interface |
| - π Deterministic grading system |
| - π Reproducible inference pipeline |
| - π³ Fully containerized (Docker + Hugging Face Spaces) |
|
|
| --- |
|
|
| ## π Tasks |
|
|
| | Task ID | Difficulty | Objective | |
| |--------|----------|----------| |
| | `rename_variables` | Easy | Replace generic variable names | |
| | `remove_dead_code` | Medium | Remove unreachable logic | |
| | `full_refactor` | Hard | Combine multiple optimizations | |
|
|
| Each task uses AST-based transformations and deterministic grading. |
|
|
| ## Task Descriptions with Expected Difficulty Levels |
|
|
| - Easy (`rename_variables`): rename generic names like `x`, `tmp`, `i` into descriptive identifiers. |
| - Medium (`remove_dead_code`): remove unreachable branches and unused assignments while preserving behavior. |
| - Hard (`full_refactor`): combine renaming, dead-code elimination, loop simplification, condition cleanup, and helper inlining. |
|
|
| --- |
|
|
| ## π― Reward System |
|
|
| Rewards are computed at every step: |
|
|
| - β
Valid executable code β positive reward |
| - π Reduced complexity β reward |
| - β‘ Improved performance β reward |
| - β Errors or invalid code β penalty |
| - π No progress β penalty |
|
|
| **Normalization:** |
|
|
| `(raw_reward + 32) / 52 β [0, 1]` |
|
|
| --- |
|
|
| ## π Example Execution |
|
|
| ```text |
| [START] task=rename_variables |
| [STEP] action=0 |
| [END] task=rename_variables score=1.00 |
| |
| [START] task=remove_dead_code |
| [STEP] action=1 |
| [END] task=remove_dead_code score=0.25 |
| |
| [START] task=full_refactor |
| [STEP] action=3 |
| [END] task=full_refactor score=0.71 |
| |
| Final Score: 0.65 |
| ``` |
|
|
| --- |
|
|
| ## ποΈ Architecture |
|
|
| - `server/app.py` β FastAPI entry point used by OpenEnv + Docker |
| - `server.py` β legacy local runner / UI helper |
| - `openenv_interface.py` β OpenEnv wrapper |
| - `acre/env/` β Core environment logic |
| - `acre/tasks/` β Task definitions |
| - `acre/utils/` β Metrics and helpers |
| - `inference.py` β Evaluation pipeline |
|
|
| --- |
|
|
| ## βοΈ OpenEnv Interface |
|
|
| ```python |
| observation = env.reset() |
| observation, reward, done, info = env.step(action) |
| state = env.state() |
| ``` |
|
|
| Uses Pydantic models: |
|
|
| - `ObservationModel` |
| - `ActionModel` |
| - `RewardModel` |
|
|
| ## Definitions of Action and Observation Spaces |
|
|
| - Observation space: Box(4) with fields `code_length`, `complexity_score`, `runtime_s`, `error_flag`. |
| - Action space: Discrete(5) with actions `rename_variable`, `remove_dead_code`, `simplify_loop`, `optimize_condition`, `inline_function`. |
|
|
| --- |
|
|
| ## π HTTP API |
|
|
| | Method | Endpoint | Description | |
| |---|---|---| |
| | GET | `/` | Health check | |
| | GET | `/health` | Compatibility check | |
| | POST | `/reset` | Reset environment | |
| | POST | `/step` | Execute action | |
| | GET | `/state` | Get state | |
| | GET | `/tasks` | List tasks | |
| | POST | `/tasks/{task_id}/grade` | Grade code | |
|
|
| --- |
|
|
| ## π Run Locally |
|
|
| ## Setup and Usage Instructions |
|
|
| ```bash |
| pip install -r requirements.txt |
| uvicorn server.app:app --host 0.0.0.0 --port 7860 |
| ``` |
|
|
| --- |
|
|
| ## π³ Docker / Hugging Face Spaces |
|
|
| ```bash |
| docker build -t acre . |
| docker run -p 7860:7860 \ |
| -e API_BASE_URL=https://api.openai.com/v1 \ |
| -e MODEL_NAME=gpt-4o-mini \ |
| -e API_KEY=your_key \ |
| -e ENV_URL=http://localhost:7860 \ |
| acre |
| ``` |
|
|
| --- |
|
|
| ## π§ͺ Inference |
|
|
| Set environment variables: |
|
|
| ```bash |
| export API_BASE_URL=https://api.openai.com/v1 |
| export MODEL_NAME=gpt-4o-mini |
| export API_KEY=your_key |
| export ENV_URL=http://localhost:7860 |
| ``` |
|
|
| Run: |
|
|
| ```bash |
| python inference.py |
| ``` |
|
|
| Expected output: |
|
|
| ```text |
| Easy: 1.00 |
| Medium: 0.25 |
| Hard: 0.71 |
| Final: 0.65 |
| ``` |
|
|
| --- |
|
|
| ## π OpenEnv Compliance |
|
|
| - β `step()` implemented |
| - β `reset()` implemented |
| - β `state()` implemented |
| - β reward shaping |
| - β deterministic grading |
| - β structured logs |
|
|
| --- |
|
|
| ## π§ͺ Validation |
|
|
| ```bash |
| python validate.py --url http://localhost:7860 |
| ``` |
|
|
| Or: |
|
|
| ```bash |
| openenv validate |
| ``` |
|
|
| --- |
|
|
| ## π Live Demo |
|
|
| π Running on Hugging Face Spaces |
|
|
| --- |
|
|
| ## π Baseline Performance |
|
|
| ## Baseline Performance Scores |
|
|
| | Task | Score | |
| |---|---| |
| | `rename_variables` | 1.0000 | |
| | `remove_dead_code` | 0.2500 | |
| | `full_refactor` | 0.7143 | |
| | Average | 0.6548 | |
|
|
| --- |
|
|
| ## π Use Cases |
|
|
| - AI-powered code optimization |
| - Automated refactoring tools |
| - Reinforcement learning environments |
| - Developer productivity systems |
|
|
| --- |
|
|
| ## π License |
|
|
| MIT License |
|
|